[SOUND]. Welcome, this lecture describes how the Simple Regression model can be represented. We formalized the notion that you can use values of variable to predict the values of another variable. As in the previous lecture we will consider again the scatter plot of sales against price. That is, suppose that the store manager considers a price above 54. To be high, and a prize equal or below 54 to below. We can then draw two histograms for the sales data, one for the high prices and one for the low prices. The two histograms are as follows, where you should specifically look at the numbers on the horizontal axis. Prices are high in 72 weeks with median sales of 91 and prices are low in 32 weeks with median sales of 94. We just might expect sales to be on average about three units larger for low prices as compared to high prices. Hence, knowing the price to be high or low results in a different sales prediction. In other words, it helps to explain sales by using price as an explanatory factor. Therefore we will call sales the dependent variable, and price the explanatory variable or explanatory factor. For dependent variable y with observations y subscript i, we can assume as we did for the seal's data in the first lecture that y is identically distributed as normal with mean mu and variance sigma squared. In that case, the expected value with notation E of y is equal to mu. And the variance of y is equal to sigma squared. Again, you can consult the building blocks for further details. An estimator of the population mean mu is given by the sample mean, y bar. And an estimator for sigma squared is the sample variance. The idea of using one variable to predict the other instead of just using the sample mean means that we move from an unconditional mean to a conditional mean given a value of x. For example, the conditional mean can be alpha plus beta times x. We thus move from an unconditional prediction to a conditional prediction. An alternative way of writing the conditional prediction follows from d meaning y, by subtracting the linear relation, alpha plus beta times x. Such that a normally distribute error term would mean mu emerges. This re-written form will become useful when we want to estimate the coefficients from observed data as will be demonstrated in the next lecture. The expressions together form the simple regression model that says that the prediction of y for a given value of x is equal to alpha plus beta times x. This simple regression model contains a single explanatory variable. And therefore, anything that is not in the model is covered by the error epsilon. For example, for the sales and price example, we did not include the prices of competing stores or the number of visitors through the store in each week. Small values of the errors epsilon one to epsilon n associated with more accurate predictions of sales, than when these errors are large. So if we would have estimates of these errors, then we can evaluate the quality of the predictions. To get these estimates, we first need to estimate alpha and beta. In the next lecture we will present a method to estimate these parameters. But now I invite you to answer a test question that deals with an imaginary straight through points in a scatter. Predict the sales if the price is 50 and what is your prediction of sales if the price is 58? The answer starts with fitting a straight line to the data. As shown by line in the scatter diagram. And then you can see that a price of 50 would put the sales of approximately 99 while a prediction of 58 associates, with a sales of 85. Did you guess right? The result is based on a straight line and we will explain how to get this line in the next lecture. We will now turn our focus on the interpretation of the slope parameter in a regression model. As you will see, this interpretation is related to how the variables are measured. First, consider the next test question on relative changes. Stores A and B both witness an increase in sales due to advertising. Which store has the largest relative increase? The correct answer is that relative change in store A is 225 minus 150 divided by 150 and that is point five. Whereas the relative change in store B is point six. In mathematical notation, we write the relative increase as dx divided by x. The parameter beta in the simple regression model has the input notation of the derivative of y with respect to x. In economics, we often use the concept of elasticity which measures, for example, the percentage increase in sales associated with 1% decrease in price. This facilitates the interpretation and as the elasticity is scale free, it also allows for a comparison across cases, like related retail stores. The elasticity is defined as the relative change and y, that is d y, divided by y caused by the relative change d x divided by x. If the relationship between prize and sales is linear, the value of the elasticity depends on the value of the sales and prize. This dependence makes it difficult, for example, to compare across retail stores with different floor sizes. To facilitate such comparisons, store managers prefer a measure of elasticity that does not depend on the ratio x over y. To achieve that, one can transform the y and x variables by taking the natural logarithm, written as log. Now I invite you to make the training exercise where you can yourself with the topics that were treated in this lecture. As always, you can find this exercise on the website. And this concludes our second lecture.