So let's look at some performance measurements. Again, to summarize the notation, y of t represents the actual value at time t, f of t represents the forecasted value or estimated value at time t, and the error term, the residual e of t is denoted y of t minus f of t. Finally, n is the size of the test set, the size of your time series, how many data points do you have? So if you are looking at a time series of data over 30 days, you'll have 30 days of actual data, and you'll have 30 days of forecast that you made before that event occurred, and then you would have to be able to take the difference. The first performance measurement that I would like to talk about is called the mean forecast error or MFP for short. The formula is noted here. Let's drill down a little bit on that formula. E of t is equal to, as noted on the previous slide, y of t minus f of t. Then we have this summation term that adds them all up. Then we divide by n. If you look at this carefully, it's very much like an average. In fact, it is an average. It's an average of your error terms. One thing I want to point out is, how do you read these formulas and translate that into a spreadsheet? So let's say you have a spreadsheet and you have your column heading, and then you have some data. When it says, you see that giant Sigma, that's the summation sign. But I want you to think is that e of t is just a column of data, and you're just going to add them all up. So you have a column of data, and you're just going to add them all up. That's what this portion here represents right here. Okay. That's a column of data and added up. Nine times out of 10 when looking at these formula, you're going to add up the whole column. N is the number of observations. So t goes from one to n, so we're going to add up the column, and then we're going to divide by n. So that is basically your average of your error term. Another way to think about this since we have this y_t and f of t column, is that you have y of t, and you might have some observations. You have f of t in your spreadsheet, and you have sums observations, and in this column, you have e of t where you take the corresponding values y of t minus the corresponding values of f of t, and then you take the difference, and then you just add them up and divide by n. There are some properties regarding the mean forecast error. Ideally, zero is the best, and we can see that here. If y of t and f of t are identical, meaning your prediction hit the mark and it's exactly the same as the actual value, that term will be zero. The more zeros you have, the close you are to zero. So that's the desirable goal. It is affected by the scale of the measurement and data transformation. So the actual nominal value, the actual numerical value will be affected by things like using Fahrenheit as a scale, or Celsius as a scale. The numbers mean something different. So that's something to keep in mind. The other thing it does not do is penalize for extreme errors. So if you make a forecast that was way off, it doesn't really penalize for that. The next measurement I want to talk about is the mean absolute value. Here again is that, e of t term, and like before, it's the y of t minus f of t is equal to e of t. But instead just adding them up, it takes the absolute value and divides by n. So it strips the sign. So it's just looking at how far away the prediction is. It doesn't matter if it's above or below. Okay. As before, small values of the mean absolute value are desirable, and again, it is affected by measurement or data transformations, and again, it does not penalize for extreme values. Another one is to take percentages. So instead of taking the absolute difference, we're going to look at percentages. To get the percentage, we take the difference between our forecast value and the observed value, and we divide by the observed value. That's our forecast. We do that for every time period, and then we take the average multiply it by a 100 to get a percentage. Again, like the other ones, being close to zero is desirable. That's again because of this e term, which is essentially y_t minus f of t. We want those to be on top of each other. We can take absolute percentage value, is very similar to the mean percentage error term, but here we just take the absolute value. We're stripping away the sign. Again, being close to zero is better than being far apart. Those last terms just looked at the absolute distances. Here, we're looking at squared error, the Euclidean distance, and so here we have this e sub t squared, which is y_t minus my forecast value squared. So again, you have some data, we have my forecast, and we have some data in a spreadsheet, and then a third column we take y_t minus f of t. So we have this stuff, then we square that term, and we add them up, and then we take the average of those. Note, there is no direction about the error direction because by squaring the term, all the values are positive. It is affected by the scale of measurement and the data transformation. But in this case, it does penalize for extreme error values. Related to mean squared error, is sum of squared error, sometimes denoted SSE. Here we have simply the sum of all those numbers. It has the same measurement characteristics or properties as the mean squared error, and x is the signed mean squared error. So here we have the same term, the e_t squared, but here is where we get the sine part, the signed mean squared error. So if we look at this term e_t over the absolute value of e_t, this numerator will either be plus or minus, and this denominator will always be positive. So that's how the sign of the term gets carried in there. So it adds or subtracts as we go through these e_t squared components. That's how it incorporates the direction of error, whether it's too high or too low. Unlike before, the squared error terms does penalize for extreme errors. Here's another one. It's called Theil's U-statistic. It's a complicated formula, but we can break it down. Here's the mean squared error, and then it takes a similar calculation with the forecasted value and the actual values. So the errors are divided by these two terms. The U-statistic ranges from zero to one, where again U equaling zero represents a perfect fit. That's right here. In this part of the equation, again y_t minus f of t is e_t, and then if we square that t squared term and we add them up and take the average. Again, if this is zero, then the whole numerator is zero, and so U become zero. Otherwise at the other end of the spectrum, it's a one. So we want small values of this final one. Finally, I'm going to talk about the root mean squared error, which is essentially the mean squared error, you can see it, and you take the square root. It shares the same properties of the mean squared error, and this is the probably the one that we will use most often in this course. In general, this is a very popular performance metric to use. So that really concludes our section on performance measurements. We've talked about a number of them. We talked about distance. In the end, I want you to focus on this one in an applied science. As you gain more experience, you'll see some other ones that are also quite useful. That wraps up performance measurements.