Hi everyone. In the video, why we need multi-level analysis? We've discussed that multi-level analysis is important to social scientists because social science data is usually grouped together somehow. In addition, we shortly revisited the multiple regression model. In this video, we'll extend the multiple regression analysis model to handle multi-level data. After watching, you'll know that multi-level analysis is basically simultaneously running several regression models that are tied together in a clever way. You'll know the difference between fixed and random effects. In the multiple regression model, we assume a straight line relation between the dependent variable and a set of independent variables. This straight line is perfectly described by two parameters, b0 the intercept, and b1 the slope. Using the least squares method, you can determine the value for b0 and b1 that gives the best fitting line, where best-fitting means aligns with the lowest total distance to all observed scores. Now, how does all this translate to multilevel data? Well, as it turns out the basic ideas stays remarkably the same. A multi-level regression model fits a straight line to the data and the trick is again, finding the straight line with the smallest total distance between it and the data points. In fact, a multi-level regression analysis can be viewed as nothing more than a set of multiple regression analysis, one for each level of the data that are tied together in a clever way. Now let's use the example of students grouped in schools again. Assume we want to study the relationship between school achievement and pupils IQ. In addition, we want to take school size into account because you suspect that students at smaller schools get more individual attention and will therefore performed better. This data is nested. Some students in our sample come from the same school, while others go to a different one. Due to the reasons mentioned in the video titled, "why we need multi-level analysis? " it is reasonable to assume that pupils from the same school are more alike than pupils from different schools. Hence, we have dependency in our data and we need to do multi-level analysis. The first step in this analysis is assigning each variable to the correct level in our model. In our example, we have two levels. Level 1, the lowest level, consists of the pupils who are interested into schools. This makes school are level two units. Looking at our variables, school achievement IQ, and school size, we see that two of these variables are characteristics of the pupils, school achievement and IQ. Scores size on the other hand, is a characteristic of the school's. This means that school achievement and IQ, are level 1 variables and school size is a level 2 variable. Now let's start building our multi-level regression model. Starting with level 1. On level 1, we want to describe the relationship between school achievement and IQ using a straight line. This can be represented with the following equation, where the subscript i indicates different students and the subscript j indicates different schools. On the second level or school level, we want to predict a relation between school achievement and school size. However, school achievement was a pupil level variable, which means that there are multiple school achievement scores for each school. How do we turn these multiple school achievement scores into one score for each school? The answer, quite simply is by determining the average school achievement score for each school. By averaging the school achievement scores of all the students, we effectively move to the school achievement variable from the lower level to the higher level. This progress is called aggregation. By aggregation we make school achievement a level 2 variable that represents a characteristic of the level 2 units, the schools. Now in practice, multilevel analysis averages the scores in a clever way, taking both the number of nested observations and the level 1 predicts this into account, but we won't worry about that now. Next, we can simply run a multiple regression analysis on the schools with average school achievement as dependent variable and school size as the independent variable. Now we're almost done. We separately analyzed level 1 and level 2 but these analysis are obviously not unrelated. They will run on the same data and the same subjects. How do we tie them together? The key to that is the intercept in a level 1 model, b0. What is a substantive interpretation of this parameter? Let's make this a little bit easier by removing the level 1 predictor IQ from the model for now. Now the model looks like this, which describes a horizontal line. Now what does b0 mean? It is the mean y score across all individuals. In other words, the mean score on the dependent variable and b0 are the same in this model with no predictors. Now remember that on level 2, we wanted to predict the mean of the dependent variable for the different level 2 units. Since we just saw that b0 can reflect the mean dependent variable score, we can rephrase this and say that the level 2 model tries to predict a b0 value for each school. This is how the level 1 and level 2 model tie together, b0 which is a parameter in the level 1 model, is the dependent variable of the level 2 model. Now if you add predictors to our level 1 model, the interpretation of b0 changes a bit, but the fact that it forms the link between the level 1 and level 2 model does not change. This gives us the following equations, where the intercept of the level 1 model now has a subscript j, indicating that can vary across schools and we can combine these equations into one multi-level regression equation that looks very similar to the standard multiple regression model. The only two differences being that we have two error terms, e for level 1 errors and u for level 2 errors. Because b0 now has a separate value for each level 2 unit, we call it a random parameter as opposed to, for example, the slope of IQ in this example, which has the same value across individuals and schools, and therefore called a fixed parameter. That's it for this video. If you want to see how we can also make slopes vary across level 2 units, check out the video titled multi-level models with random slopes.