Wed 21-Aug-2019

I don’t know (beyond some extreme examples, like “The Magical Number Seven, Plus or Minus Two” or “Computing Machinery and Intelligence”) what qualifies a journal article as seminal. However, if there is such thing as a seminal article in the field of Machine Learning-powered prediction of financial markets, then Kyoung-Jae’s 2003 article is definitely it (1,300 citations on Google Scholar and counting).

The author introduces then (relatively) unknown Support Vector Machines (a subset of supervised learning AI models) into financial prediction domain. Using 10 years of historical data of Korean KOSPI index, the author is comparing the predictive power of SVM’s against a traditional backpropagation neural network. In his brief and elegantly written article, the author is very clear about the end goal: prediction of index’s direction on the following business day (i.e. whether it’s going to go up or down). SVM’s are critical and essential, but the article isn’t about SVM’s in their own right – it’s about SVM’s being able (or unable) to predict the direction of KOSPI.

Kim’s article is worth reading for a number of reasons.

Firstly, it is one of the first articles to bring SVM’s into financial time series prediction discussion, and one of the most influential ones by far (judging by number of citations). Kim posits that SVM’s should be more accurate than neural networks as they require only one or two parameters to tune and are less prone to overfitting. His experiment tests this assumption.

The author attempts prediction of Korean KOSPI index using out-of-sample training data. He is not attempting to predict exact index level, only day-on-day direction change (whether it’s going to go up or down compared to the current day). This simplified approach is still a perfectly legitimate investment strategy – plus it is very easy to quantify using straightforward statistics (i.e. simple % accuracy).

Furthermore, he is comparing and contrasting SVM’s with more mainstream backpropagation neural networks (BPNN’s) as well as less popular case-based reasoning (CBR), enabling like-for-like accuracy analysis using the same input data. The input data is just a series of technical indicators using OHLC (open /high/low/ close) daily time series. In this sense the experiment is using one of the most limited input data sets of all – notably it doesn’t even use volume data (the majority of other experiments do). One could argue that the data series is *too* limited and adding more input variables (e.g. volume) could increase predictive accuracy of the model.

Experiment results are very interesting, if not sobering. Using market data which certainly hadn’t been subjected to any “AI arbitrage” (because it dates back to the period when AI was not deployed in the real markets), the results are only slightly above a random 50/50 guess. In the most successful configuration (and Kim tested many), SVM’s achieved 57.83% prediction accuracy on out-of-sample testing data (in some other configurations it was only marginally above 50%, but never below). While it was higher than BPNN (with 54.73%) and CBR (with 51.98%), this result is still poor. While theoretically prediction accuracy even marginally above 50% should ensure virtually infinite profit over sufficiently long a time frame, that won’t hold in practice. If we factor in trading costs, as well as standard overheads of a fund, we need accuracy much higher than 50% in order to generate a net profit (the author didn’t factor in trading or any other costs).

However, Kim’s conviction that SVM’s may be a useful tool in financial time series prediction has largely proven itself to be correct. SVM’s are one of the most popular Machine Learning approaches in financial predictions (probably second after neural networks) and are widely believed to be among the most accurate ones. His brief and elegant introduction of this ML technique remains a huge contribution to academic works nearly 2 decades later.

You can find the article through Google Scholar. Please note that it may be behind a paywall.

*Full citation: K.-j. Kim, „Financial time series forecasting using support vector machines,” Neurocomputing, no 55, pp. 307 – 319, 2003.