Welcome. The topic of this lecture is time series and in particular the representation of models that are often applied to time series data. Time series models typically are constructed with two main objectives. First, we want to describe the key properties of the time series data. In particular, the nature of the trend and the correlations with past values. And second, we may want to exploit these features to make forecasts of future observations. The time series of interest is denoted as y, with the subscript t indicating the observation in period t. n refers to the number of available observations, or the length of the time series. Let us begin with a very important issue, that of stationarity. A time series y is called stationary, if its mean, variance, and covariances with past observations are constant over time. Stationarity is an important condition that needs to be satisfied before we can even start thinking about designing a meaningful model for a given time series. Intuitively, if for each new observation of the time series properties like the mean and variance change, then we cannot reliably model such data, let alone provide reliable forecasts. The autocovariances gamma k measure how strongly related observations at different points in time are. In terms of forecasting, they indicate whether past observations can be useful to make predictions of future observations. Note that when all these autocovariances are zero, then the past carries no predictive value for the future. We call such a time series white noise. Note that this relates to assumption A5 for regression models in lectures one and two. A white noise time series cannot be predicted from its own past, and the only useful prediction is the mean of the variable itself. Now this brings us to the essence of time series modeling. Our aim is to design a model that distills information from the past for forecasting. The model is deemed successful or adequate, if after all this distillation, there is nothing left that is informative for prediction. That is, the residuals are white noise. In all what follows, we follow the usual convention to write a white noise variable as epsilon. Sometimes we call this epsilon the error but we also use the word shock to indicate that epsilon is something new to the variable y. A simple and popular time series model is the autoregressive model. An autoregression of order 1, or briefly AR(1), is a model where the current observation of y in period t is explained by the previous observation of y in period t minus 1. This simple model provides a nice way to illustrate the relevance of stationarity. If the slope parameter beta lies between -1 and +1, the effects of past shocks epsilon die out. So, the more distant in the past, the less impact those shocks have on current values of the variable y. This is a typical property of a stationary time series. Later, we will see that stationarity is lost if beta is equal to 1. The first order autoregression assumes that current y can be predicted by 1 period lagged y, but of course it might also be that 1 period lagged y and also 2 period lagged y are useful for predicting the current observation. In fact, the number of lags can run up to p, giving rise to the so-called AR(p) model. Now I invite you to consider the following test question. Consider again the autoregression of order 1, where the current value of the white noise series epsilon is uncorrelated with the past of y. When the slope parameter beta is equal to 1, can you argue why the y variable is not stationary? The answer is that when the intercept alpha is not 0, then the mean will change by alpha for every new observation. And when the intercept alpha is zero then the variance of the observations increases over time. Another useful time series model includes past shocks as explanatory variable. When you look at the epsilons as forecast errors, you can learn from these errors by taking them into account when making new forecasts. The so called first order moving average model, or MA(1), includes epsilon 1 period lagged. This model implies that y is correlated with y 1 period lagged but not with more distant lags. We can generalize this model to a moving average model of order q, which includes q lag forecast errors. It is also possible to combine the two models, which gives rise to an ARMA(1,1), if p and q are both equal to 1, or an ARMA(p,q), if these orders take different values. Moving average terms may arise when two autoregressive processes are related. Now, I invite you to make the following test question. The variable y depends on x and x depends on x 1 period lagged. Can you derive what the implied ARMA model is for y? We see that now y depends on y 1 period lagged, on the shock to x, the shock to y and, this is crucial, also the one period lagged shock to y. So the autoregressive order is one, whereas the moving average order is also one. Hence, correlation, across joint autoregressive time series can lead to individual time series models of the ARMA type. The time series models that we have discussed so far, autoregression and moving average and their combination, imply specific correlation properties of the time series. This relation can be reversed, that is, when you see certain properties of the data in the real world you can decide which model to use. And the autocorrelation is a very useful tool for this purpose. For example, when the data are generated by a moving average model of order q, then the sample autocorrelations after lag q, will all be close to zero. Next to autocorrelations, we also have the concept of partial autocorrelations. These account for the fact that the observations of y at time t, and at time t minus two, may seem to be correlated due to the fact that they both are related to the observation of y at t minus one. The sample partial autocorrelations follow from regressions of y on its own past values. If in this regression, the k-th lag coefficient is insignificant for values larger than p, then this suggests to use an AR model of this order p. The 95% confidence bounds around autocorrelations and partial autocorrelations are marked by plus and minus two divided by the square root of the sample size n. Let us return now to the airline revenue passenger kilometers data. The data show a trend. And the growth rates do not. The autocorrelations of the log series show a very slowly decaying pattern. And the partial autocorrelations are only large at lag 1. The values for the growth rates are not significant, as the number of observations is 39 and 2 divided by the square root of 39 is about 0.3. Next, let us pay attention to the important issue of trends. Several trend models are available. First, you have what is called a random walk. This is an autoregressive model, but with a slope parameter equal to 1. When the intercept alpha is unequal to 0, then we get trending data that looks like the airline's data, or the industrial production index considered before. When the model also contains a deterministic trend term, beta times t, then you get an explosive trend pattern. If no lagged value of y is included, this gives a fully deterministic trend model. And if the lagged y term has a parameter smaller than 1, then this still results in a deterministic trend, but without random walk aspects. The main notion here, is that a stochastic trend, that is where the autoregressive parameter is equal to 1, can only be silenced by transforming the data, by taking their first difference. This is denoted by the symbol Delta. To give you some visual impression of how the parameters in a time series model determine how the data will look like, consider the following graphs of artificially generated data. Clearly the parameters matter a lot in how the data look like. This notion will be exploited in the actual analysis of real data. You look at the data and you see certain properties and from these properties, you can get ideas on the models that might be useful to fit and forecast this time series. If two time series share the same stochastic trend, we say that they are cointegrated. In the next lecture, we will consider this in more detail. But for now, I ask you to consider an example. And therefore I invite you to make the following test exercise. The answer uses the fact that y and x share the same stochastic trend z. A specific linear combination of y and x does not include that trend anymore. Now I invite you to make the training exercise, where you can train yourself with the topics that were treated in this lecture. And as always, you can find this exercise also on the website. And this concludes our lecture on representation by means of time series models.