[SOUND] Welcome. The topic of our third lecture is Simple Regression, and in particular, estimation. Until now, you acquired insights into two aspects of predicting values of a dependent variable y based on an explanatory variable x. First, there are coefficients a and b that can be useful in practice, and there are parameters alpha and beta that only exist in theory. In this lecture you will see how to obtain values of a and b from a given set of observations. In the next lecture, we will examine their link with alpha and beta, given a set of assumptions. We will use observed data on x and y to find optimal values of the coefficients a and b. The line y is a + bx is called the regression line. We have n pairs of observations on x and y, and we want to find the line that gives the best fit to these points. The idea is that we want to explain the variation in the outcomes of the variable y by the variation in the explanatory variable x. Think again of the high price low sales combinations, in the previous lecture, versus the low price, high sales combinations. When we use the linear function a + bx to predict y, then we get residuals e. And we want to choose the fitted line such that these residuals are small. And this is shown by the line in the scatter diagram. Minimizing the residuals seems a sensible strategy to find the best possible values for a and b. And a useful objective function is the sum of squared residuals. This way of finding values for a and b is called the method of least squares, abbreviated as LS. The minimum of the objective function is obtained by solving the first order conditions. This is done by taking the partial derivatives of the objective function and setting these to 0. Let us start with the coefficient a. Solving the first order condition gives that minus 2 times the sum of the residuals is equal to 0. Note that when the sum of the residuals equals 0, then one of the residuals is a function of the other, n- 1 residuals. We now look for a simple expression for a. For that purpose, rewrite the expression on the derivative of the objective function to the coefficient a. We now recognize the expressions for the sample means of y and x. So we find that a is equal to the sample mean of y minus b times the sample mean of x. I now invite you to answer the following test question. You have seen that the sample means of y and x play a role in finding the value of a, and this test question deals with the situation where y and x are already de-meaned prior to the analysis. Which values do a and/or b take in this special case? The answer is that the value of a is then equal to 0, whatever is the value of b. Later we will see that the value of b does not change. Now we turn to the coefficient b. When you take the partial derivative of the objective function to b, you get that the sum of the observations on x times the residuals e is equal to 0. Note that this puts another restriction on the n values of e. This implies that of the n values of e, two are found from the other n-2 values. And now we derive the expression for b. Please take some time to check the steps shown on the slide We are not yet there, but almost. We derived the expression for b shown on the slide. And we can use a few results on summations and means, which leads to a more convenient expression for b. And in the end we get this expression for b. This important expression shows that b is equal to the sample covariance of y and x divided by the sample variance of x. I now invite you to consider the following test question. What happens to b if all y observations are equal? The answer is that then b is equal to 0. So if there is no variation in y, there is no need to include any x to predict the values of y. When we fit a straight line to a scatter of data, we want to know how good this line fits the data. And one measure for this is called the R-squared. The line emerges from explaining the variation in the outcomes of the variable y by means of the variation in the explanatory variable x. And we can now formalize this as shown on the slide. The deviation of y from its mean is partly explained by deviations of x from its mean. Please, again, take some time to check the steps shown on this slide, where the cross product term of the x variables with the residuals e in the final expression drops out, because their sample covariance is equal to 0. Now R-squared is defined as the fraction of the variation in y that is explained by the regression model. When R-squared is 0, there is no fit at all. When the R-squared is 1, the fit is perfect. Next we estimate the unknown variance of the epsilons from the residuals. Note that we take account of the fact that two of the residuals can be computed from the other n-2 residuals. Since the mean value of the residuals is 0, we get the formula shown on the slide for the estimated variance s-squared of the error variance sigma-squared. You may now wish to consult the Building Blocks for the sample variance of a random sample, where we divide by n-1 instead of n-2. This concludes our lecture. Given observations on x and y, you can now compute the coefficients associated with a straight line, and you can also evaluate the quality of the fit. I now invite you to make the training exercise, where you can train yourself with the topics that were treated in this lecture. You can find this exercise on the website. And this concludes our third lecture on Simple Regression.