[MUSIC] We discussed linear regression briefly in the previous course. And we fit a few models with non-informative priors. Here, we'll provide a brief review, demonstrate fitting linear regression models in jags. And discuss a few practical skills that are helpful when fitting linear models in general. This is not meant to be a comprehensive treatment of linear models, which you can find in numerous courses and textbooks. Linear regression is perhaps the simplest way to relate a continuous response variable to multiple explanatory variables. This may arise from observing several variables together and investigating which variables correlate with the response variable. Or it could arise from conducting an experiment, where we carefully assign values of explanatory variables to randomly selected subjects. And try to establish a cause and effect relationship. A linear regression model has the following form,the response y for observation i,will be equal to this linear form of the other variable. We have an intercept, beta not, and then a coefficient beta 1 for the first x variable. And so forth for up to k variables. So this would be the ith value of xk. This describes the mean, and then we would also add an error, individual term for observation I. We would assume that the Epsilons are IID from a normal distribution means 0 variance sigma squared for observations 1 up to n. Equivalently we can write this model for y, i directly as y, i given all of the x values for individual i. All of the Beta coefficients and sigma squared would be independently distributed, not identically distributed, because the mean changes for each observation. But independent from a normal with the same mean. Beta 0 + beta 1 x1i up for all of our k variables. And constant variance sigma squared. Again, k is the number of predictor variables. This yields the following graphical model structure. We'll start with a plate for our All of our different y variables, so yi, which is random and observed. Or i = 1 up to n. We also observe the x variables. So we have X1i, X2i, all the way up to Xki. And I'm going to draw squares around these ones instead of circles to indicate that they are not random variables. We're always conditioning on the Xs. So they'll just be constants. Of course, the yi's depend on the values of the x's and the yi's will also depend on the values of these parameters. So, we have beta not, beta one, up to beta k. And of course, sigma squared. And the yi's depend on all of these. So this would be the graphical model representation. The terms of a linear model are always linearly related because of the structure of the model. But the model does not have to be linear necessarily in the xy relationship. For example, it may be that y is related linearly to x squared instead of x. Hence we could transform the x and y variables to get new x's and new y's but we would still have a linear model. However, in that case, if we transform the variables, we must be careful about how this changes the final interpretation of the results. The basic interpretation of the beta coefficients is this. While holding all other x variables constant, if x1, for example, increases by one, then the mean of y is expected to increase by beta one. That is beta 1 describes how the mean of y changes with changes in x1, while accounting for all the other x variables. That's true for all of the x variables. Note that we're going to assume that the ys are independent of each other, given the xs. And that they have the same variance. These are strong assumptions that are not always met. And there are many statistical models to address that. We'll look at some hierarchical methods in the coming lessons. Of course, we need to complete the model with prior distributions. So we might say beta 0 comes from its prior. Beta 1 would come from its prior, and so forth for all the betas. And sigma squared would come from its prior. The most common choice for prior on the betas, is a normal distribution. Or we can do a multivariate normal for all of the betas at once. This is conditionally conjugate and allows us to do Gibbs sampling. If we want to be non informative, we can choose normal priors with very large variance. Which are practically flat for realistic values of beta. The non-informative priors used in the last class are equivalent to using normal priors with infinite variance. We can also use the conditionally conjugate inverse gamma prior for sigma squared that we're familiar with. Another common prior for the betas is double exponential, or the Laplace prior, or Laplace distribution. The Laplace prior has this density. One half e to the negative, absolute value of beta. And the density looks like this. It's called double exponential because it looks like the exponential distribution except it's been reflected over the y axis. It has a sharp peak at x equals 0, or beta equals 0 in this case, which can be useful if we want to do variable selection among our x's. Because it'll favor values in your 0 for these betas. This is related to the popular regression technique known as the LASSO. [MUSIC]