We continue on our discussion of ARIMA by looking at an actual modeling example using R. Here are the basic steps in building a model, an ARIMA model. The first thing you want to do and this goes for almost any analytical project, is to plot the data and see if there are any abnormal observations. So if you see some sort of regular pattern to your eyes, that's something to keep in mind or if you see some weird spike, you might want to go to that time period and investigate, is that a typo in the data or were there some strange events that happened on that day that might have caused a shock? So those are the things you might want to look at and investigate. Next you'll want to do the ADF test, the stationarity test to make sure that the data is stationary. If it's not stationary you take differences or live differences of the data until you do achieve stationarity and we saw that before. The next thing you want to do is, look at the patterns of the auto-correlation function and partial auto-correlation function. That's to look and determine if there's any need lags that are needed in your auto-regressive part of the model or you're moving average part of the model. Then you fit the model that is suggested. You look at the results and you iterate this process until you find the best model. So here are the packages you'll need. You'll need ggplot2, that's your graphing package, your time series and forecasting package, as well as URCA package in R. The URCA package tests unit root and it has the function to test for stationarity. For this example, I'm going to pull a used data from the S&P 500. An exchange traded fund prices from 2007 until 2019, and that gives us approximately or exactly 3,142 working base of a time series. The data was pulled from the Yahoo! Finance website. Here's a plot of the data. Clearly it's not stationary, you can see a trend here but there's a dip. There's a trend going upward, it's not clear to me if the swings around the trend line are constant or not. So that's something to look at. So here's the code to plot the data. Let's do that in R. So here's the R code. I've already imported the data. You'll want to use this input data set command from Excel to import this file or you can go out and get it from Yahoo! Finance yourself. It's in this variable called data. For the prices, I'm just going to get the price column of data. So let's do that. So that's my SPY variable here. So now I have a column of prices. This is just one column. There's 3,142 observations. If you want to look at the first 10 observations, we can use this head command. That's always a good thing to do just as a sanity check. Then here's the command to create the plot that you saw on the PowerPoint slide. There it is, plot SPY, is the simple version of the command. These other parameters color is dark red. The x label is time period, the y label is SPY prices. Then there's some other parameters that you can look at, type as line, the width of the line excuse me, etc, in the main title. So step 2 in the process is to test for stationarity. The results will look something like this. Here on the top right you can see the code, the stationary tests is UR.DF. That's from the URCA package. What's the data SPY? Then and you put the results of that into some variable, in this case stationarity test and then you do a summary of the stationary tests and you look at it. In this particular example, when you look at the value of the test statistic is 2.27. When interpreting the results, you want to look at the test statistic and then down below R gives you the critical values of this test statistic. So at the one percent level, if it's less than the minus 2.58 then you know the data is stationary. Since the data in this case, particular example is not stationary, you might want to try something like differencing or lag differencing. So let's look at the R code. So here's the code to test for stationarity. You want to put the results of your stationary tests into some variable, in this case I called it stationarity test. Here is the command. I'm going to test on this data set SPY and there are no lags. In this case, we're going to use the AIC criteria which I'll explain at the end of this video. Let's look at that. Then here's a summary of our stationary tests and these are the results that you saw in the PowerPoint slide 2.2795, and it's not less than these critical values minus 2.58. I'm going to continue here in R and I'm going create this variable D.SPY that's a variable name. I'm going to use this function called DIFF or difference. What I'm I going to difference? I'm going to difference the SPY data. I want to create a lag of one to create those differences. So in other words I'm going to take today's value minus yesterday's value to get my first difference. So let's run that command. So now I have a bunch of lag differences. I can look at the first few values and there they are. Let's plot them though. I think that's a little more useful and you can see here now, the trend has been taken away. It's flat and it does seem to be a little more stationery. We can certainly test for stationarity even though the visual observation says, yes, but we will do a test for that. Let's run that code. Now we have a test statistic of minus 42.103 and the critical value is 2.58. So minus 42 is way less than minus 2.58. At the one percent level we know that we have a stationary time series.