How do we explain the graph of a time series data, like the movement of stock price? Can we fit a linear or non-linear equation describing the frequent fluctuations that are an integral part of such data distribution? If we can fit an equation with the least possible error, what would its order be?
While most of the graphs can be explained by an equation irrespective of their complexity, we also need to consider the practical computing constraints and business needs that are to be met. Hence an alternate approach to time series data is necessary.
In a basic regression data, a set of independent variables influence the outcome of dependent variable. In time series data, in addition to the dependence on independent attributes, every output is dependent on the previous predictions/ outputs based on a time frame. The degree of dependence on past outcomes varies with each case, and can be explained by Autocorrelation Factor (ACF).
The way we handle time series data is not by trying to fit a regression equation directly, but by splitting the distribution into 3 unique parts: trend, seasonality and randomness. As an example, let us consider the tourism prediction problem from kaggle. The data has been preprocessed and simplified for easier understanding.
It is the regressive part of the time series data, wherein we ignore the fluctuations and try to observe the movement of the target variable as a relatively smooth curve.
The above plot shows the smoothed form of the time series example. Seasonality and random error are removed from the data. From the graph, we see that in every decade, revenue generated is high in the intervals of third- fourth year and ninth year.
It is the cycle of repetition of the data in a unit frequency, irrespective of the trend and irregularities associated with it.
In this case, the tourism revenue starts off on a positive note every year, achieves its peak stage in the month of May, plummets down and then rises towards December.
It is the inherent noise present in the data, irrespective of the domain or the type of analysis being done.
How do we predict?
Since it is established that the future value of target attribute is dependent on current target values, we can build a predictive model by taking the average of current values to forecast. There are multiple methods of taking average, namely Smoothed Moving Average, Weighted Moving Average and Exponential Moving Average.
The prerequisite for building such a model is that the data should be stationary i.e. the data has constant statistical properties like mean, variance etc. over time. The way we convert non-stationary data to stationary is by differencing every observation from its nth lag, depending on the ACF and PACF plots.
Not every data set can be converted to stationary form, due to which we need to include trend and seasonality into the moving average, which led to the Holt-Winters method. It has the flexibility to account for each component of time series distribution, and can be tweaked accordingly.
Another method, called ARIMA (Autoregressive Integrated Moving Average) or SARIMA (Seasonal ARIMA) can be used, which is more generalised and relatively more flexible to select the parameters. ARIMA takes three parameters as inputs (p, d, q) signifying each component of the data distribution. These parameters can be selected based on Box-Ljung test.
Performance of any of the above models built can be judged based on its error metrics (AIC value, Root Mean Squared Error, Mean Absolute Percentage Error etc.) and can be improved upon further for forecasting.
Like the example discussed here, which is a case of revenue forecasting, time series is extensively used in the world of financial markets, stock market analysis, and in domains as diverse as prediction of natural calamities, healthcare, supply chain, weather forecasting and communications engineering.
If your business decisions require the guidance of a time series analysis and more, please feel free to reach out to us at firstname.lastname@example.org.