Home Projects Blog

"Hidden Markov Models for Time Series" - Part 1

books statistics

What to do with all this data?

Recently I worked on a data crawler, which retrieved price data from a game item store. Check out the mock-up of this project if you want to learn more about that. But soon after starting the project, an issue became apparent. I had no idea how to analyze all this interesting data. The store contained around 20 thousand items. Further, there are intricate dependencies where one item could be crafted or unboxed into another, and so on. And with only one daily price and volume point per item, there are already 40 million datapoints in my PostgreSQL database. So it is quite the data trove, which could have some interesting hidden secrets.

Statistics - Yes please

If I spend all these resources on data collection and evaluation, it should lead to somewhat correct results - statistically significant a scientist would say. During my studies, I had some statistics courses, but most of them not that in-depth. Looking around for material to learn more, I stumbled upon “Hidden Markov Models for Time Series” by Walter Zucchini, Iain L. MacDonald, and Roland Langrock.

One benefit of using this book, and using HMMs for that matter, is that price and volume data are time series - data points in time order. Further, the book contains exercises and a lot of example R code, which is quite nice for learning the topic hands-on. A downside is that HMMs are just one of many potential models one could apply to price data. Some characteristics of HMMs (Hidden Markov Models) are suboptimal for price data.

Markov processes (part of the HMM) model different states, so it is useful if there are different regimes with specific effects on the price (for example bull markets, bear markets etc.). If these are not as clear, it might be a struggle to get a good fit using HMMs. And HMMs assume independence which might not be true to the data. The observation only depends on the state the system is in, but does not depend on what happened before or after. This is not very good for financial data because usually one can observe time-dependent correlations in prices, volatility, and so on. In other words, yesterday’s price influences today’s price. So the HMMs are also not fitting to model such correlations.

Still, the book is very interesting, and helpful on the topic of HMMs. If you are into statistics, math, or want to learn about analyzing your data, you could give it a try. In the following paragraphs, I will describe a few core concepts I learned from HMM for Time Series. Please refer to the book, or other text-books on the topic, for more detailed and rigorous explanations.

HMM…?

Hidden Markov Models, simply put, combine Markov processes with probability distributions. A Markov process models a system that can be in different states. Let’s consider the state to be the daily weather. The weather can either be sunny, cloudy, or rainy. A Markov process defines not only the states, but also the probabilities of going from one state to another. For example the probability that it will rain tomorrow, given that it is sunny today.

The states of our markov process. Each arrow represents the probability of transitioning from one state to another.
The states of our Markov process. Each arrow represents the probability of transitioning from one state to another.

Hidden Markov Models are more than Markov processes. When we model something, we are not interested in the states themselves, in other words if its sunny, rainy or cloudy. But we are interested in things that happen depending on the states. In this example, we look at the number of car accidents. If the road is wet, a car is more likely to crash than when it is sunny. The number of accidents is still random in nature though. So for each state, we choose a distribution with specific parameters that models this state-dependent randomness.

The probability distributions (number of accidents on the x-axis, probability on the y-axis) for the number of accidents, for each of the three states. Sunny (orange), cloudy (gray) and rainy (blue). All three distributions are assumed to be poisson distributions, with lambda 3 (sunny), 5 (cloudy) and 10 (rainy).
The probability distributions (number of accidents on the x-axis, probability on the y-axis) for the number of accidents, for each of the three states. Sunny (orange), cloudy (gray) and rainy (blue). All three distributions are assumed to be poisson distributions, with lambda 3 (sunny), 5 (cloudy) and 10 (rainy).

Imagine you are the statistician at an insurance company trying to figure out what drives the number of accidents happening. You might make a guess and assume the weather has something to do with it, but you can’t tell yet. All the data you are provided is the number of accidents for each day. The plot below shows the number of accidents once with the weather as a colour code, and once without. You as a statistician only have access to the one without the colour.

Number of accidents that happened during 200 days. The colour of the dot represents the weather on that day.
Number of accidents that happened during 200 days. The colour of the dot represents the weather on that day.

It’s quite hard, if not impossible, to make out any clear pattern on the plot without coloured dots. Even with the colours. Sometimes sunny days can have more accidents than cloudy ones, just by chance. Also when making a bar plot of how often a specific number of accidents occurs, we cannot make out distinct states by eye.

How many days (y-axis) had a specific amount of accidents (x-axis). The black bar is the combined amount, while the coloured lines show the contribution of each of the states.
How many days (y-axis) had a specific amount of accidents (x-axis). The black bar is the combined amount, while the coloured lines show the contribution of each of the states.

As you can imagine, finding out the underlying states and parameters from the data is a complex problem to solve. Luckily, smart people have done a lot of research and science on the topic, and figured out ways to estimate the underlying model. Which is also taught in the book. In the next post, I’ll demonstrate how to infer the underlying HMM from the data shown here.

Back to the Top