Trading of CoIntegration Pairs using Mean Reversion (Statistical Arbitrage)

Arnav Gupta
7 min readMay 22, 2022

--

Introduction

Introduced in 1970, Algorithmic Trading has revolutionized the capital markets with its speed, accuracy, and reduced costs. Today, about 60–73% of the Overall US Equity Trading falls under the umbrella of Algorithmic Trading, which gains a competitive edge by allowing for faster and easier execution of orders.

While Technical Analysis forms the backbone of Algorithmic Trading, advancements in Artificial Intelligence have also allowed Machine Learning to find associations in historical data and hence adjust trades instantly depending on the market environment. This helped us narrow down our research to exploring the Difference In Performance Between Machine Learning and Technical Analysis for Trading Pairs

Hypothesis

We hypothesized that the implementation of Machine Learning techniques would perform better by producing higher returns compared to the Technical Analysis method for trading pairs.

Data Collection & Processing

For Data Collection, we imported the S&P500 Dataset from Yahoo Finance.

Figure 1: S&P500 Dataset Output

Data Processing was performed by handling missing data and encoding categorical data. Missing Data is important to handle because it reduces the representativeness of a chosen sample and eliminates bias in the estimation of parameters. Encoding Categorical Data is a very common technique of converting categorical data into integer format so it could be provided to different models. Since Machine Learning demands the input and output parameters to be numeric in nature, this process is vital to fit and evaluate a model. The below figure illustrates the ‘ticker’ column being transformed to numeric data for the ease of model fitting.

Figure 2: Encoding Categorical Data

Pairs Selection

As a general thumb of rule, Traders utilizing the pairs strategy determine two securities which -

  1. Share similar characteristics and have a high positive correlation
  2. Are trading at a price that is contradictory to their historical trading prices
  3. The stocks must have similar PE Ratios
Figure 3: Calculating P/E Ratios

In this project, we will be focusing on intra-sector pairs trade i.e analyzing two stocks in the same industry. To make our research more exhaustive, we will be selecting pairs from industries with differing volatility levels.

  • Energy: Highly Volatile due to wide fluctuations in oil prices
  • Health Care: Medium Volatile
  • Utilities: Least Volatile

The below figures show the correlation values as well as the stock price chart for a Trading Pair.

Figure 4: Correlation Values for Selected Trading Pairs
Figure 5: Stock Price Chart for Selected Trading Pair

Based on the above findings, we decided to finalize 8 Trading Pairs — from different industries.

Figure 6: Finalizing Trading Pairs

Technical Approach: Mean Reversion Strategy

Mean Reversion Strategy revolves around a financial theory that suggests that, after an extreme price move, asset prices tend to return back to normal or average levels. This means that the strategy works best when a strong trend is present and is always under an assumption — that an asset’s price will tend to converge to the average price over time.

In the context of Pairs Trading, the long position in one stock is matched with the offsetting position in another stock — which is statistically related. It again reinforces the assumption that prices will revert to their historical trends.

In order to keep the spread stationary (and mean-reverting) — we make use of the Kalman Filter, which dynamically tracks the hedging ratio between the two. To create the trading rules it is necessary to determine when the spread has moved too far from its expected value. In essence, the Kalman Filter updates its estimates at every time step and tends to weigh recent observations more than older ones.

When comparing Trading Pairs, we analyze the Sharpe Ratio as well as the Compound Annual Growth Rate (CAGR). The Sharpe Ratio indicates how well an equity investment performs in comparison to the rate of return on a risk-free investment whereas CAGR is the mean annual growth rate of an investment over a specified period of time longer than one year. For both indicators, a higher value is often associated with positive sentiments and we use the following to assess the different trading pairs.

Figure 7: Sharpe Ratio & CAGR Output
Figure 8: Stock Price For Selected 8 Trading Pairs

Kalman Filtering

Kalman Filtering, also known as linear quadratic estimation (LQE) is an algorithm that is used to optimally estimate the variables of interest when they can’t be measured directly. It is recursive in nature and shows very good results when estimating future spreads of assets. Below, we have attached the predicted spread for one of the Trading Pairs (KMI — OKE).

Figure 9: Spread Prediction (Kalman Filter)

Machine Learning Approach: Long-Short Term Memory (LSTM)

Long Short—Term Memory is an artificial neural network that is generally well suited to classifying, processing, and making predictions based on time series data. Since LSTM models are probably the most powerful approach to learning from sequential data, we will be trying to predict the spread of 1 time-step per day. Below, we have attached the predicted spread for the same Trading Pair as above(KMI — OKE).

Figure 10: LSTM Model Training Output
Figure 11: Spread Prediction (LSTM)

Executing Trading Strategy

The information regarding the spreads allows us to formulate a simple trading strategy that estimates the profit from each trade. It works by taking a short position in the overvalued assets and a long position in the undervalued assets when the spread deviates from the mean of the spread. The following logic is visually illustrated below.

Figure 12: Buy & Sell Signals based on Spread

Evaluation

In order to evaluate the different approaches, we decided to compare the Normalized spreads as well as Cumulative Returns overtime for a Trading Pair from each Industry: Energy(Most Volatile), Healthcare (Medium Volatile), and Utilities (Least Volatile).

Figure 13: Normalized Spread (Energy)
Figure 14: Cumulative Returns (Energy)
Figure 15: Normalized Spread (Healthcare)
Figure 16: Cumulative Returns (Healthcare)
Figure 17: Normalized Spread (Utilities)
Figure 18: Cumulative Returns (Utilities)

Results & Conclusion

In order to reach a conclusion, we decided to calculate the Cumulative Returns and Sharpe Ratio for a Trading Pair from each industry. We begin by calculating cumultative returns as realized profits from both long and short positions and estimate a daily return percentage change. Lastly, the Sharpe Ratio formula is applied to achieve the final results for the Actual Spread, Kalman Predicted Spread as well as the LSTM Predicted Spread.

In general, the higher the Sharpe ratio of a portfolio, the better is its risk-adjusted performance. However, we decided to evaluate the two methods based on how close it is to the actual spread.

Figure 19: Results (Energy)

For the Energy Industry, if we generalize using the KMI-OKE Trading Pair, the LSTM model performs better.

Figure 20: Results (Healthcare)

For the Healthcare Industry, if we generalize using the HUM-UNH Trading Pair, the LSTM model performs better.

Figure 21: Results (Utilities)

For the Utilities Industry, if we generalize using the ETR — AEP Trading Pair, the LSTM model performs better.

Validating Hypothesis

Based on the above results, we can validate our hypothesis that the implementation of Machine Learning techniques performs better by producing higher returns compared to the Technical Analysis method for trading pairs.

Acknowledgements

This project was completed under the Algo Trading Research Wing of the NUS FinTech Society. We would like to thank our Mentors: Hong Po & Mayve for their constant support and guidance throughout the semester.

Code Repositry

Our code can be found here

Project Completed by: Arnav, Darius, Jia Yi, Wira, and Zhili

--

--

Arnav Gupta

Business Analytics Student at National University of Singapore | Passionate and Inquisitive about Finance, Technology, Data Science, and Entrepreneurship