Now for Part 4 as we will be working through here in this video, we're going to perform PCA on that data that we worked through in the last video, and we're going to perform PCA for the number of components ranging from 1-5. We start off with six columns and no matter what we're going to try and reduce the number of columns that we'll ultimately be working with. We're then going to store the amount of explained variance for each one on the different numbers of dimensions. For one dimension, how much variance was explained. For two, so on and so forth. If we were to do number of components equal to six, then we would have explained 100 percent of the variance. We're saying how much of the variance going to explain each one of the different steps. We're also going to store the feature importances for each one of the number of dimensions. Something to know is that PCA won't explicitly provide this feature importance, but the components properties which we'll show you how to use in just a bit, will show you how each one of those principal components was composed as a combination of each one of the original features, and the larger those values are given that we've standardized our data, the more impact each one of those features has had on that principal component, and therefore we can assume that that is a more important feature. Then we're going to plot both that explained variance as well as these feature importances. Now, I'm going to break this down step by step. I'm going to actually create a cell above, but before I do that just to show you where we're starting off. We're going to import from sklearn.decomposition. We're going to import PCA. We're going to initiate an empty list of the PCA lists and the feature weight list which we're going to use to store our explained variance and the feature importances. Then for n in range one through six, so one through five if including five, so that's what we want to range through. We're going to initiate a model, a PCA model with the number of components equal to wherever we are within that range, and then we're going to fit it to our data that we have now done the transformations to ensure that it is on the same scale and mostly normal data. We're then going to take the explained variance of each and append that to PCA list, and then after a few steps which I'll walk you through in just a bit, we're going to take each one of the feature importances and append it to the feature weight list. Then after we do this for the n range one through six, we have this for each one of our different numbers of principal components. Let's start off by looking at just this step here. We're going to create a Pandas series. We actually are also going to need of course to initiate our model. What I'm going to do since I'm pulling this out here, is going to set n equal to two as we discuss all the steps, and you can imagine that this is going to do it for n equals of course one through five. We set n equals two, and then let's see what this series is that we're going to be outputting. It should be n which is the number of components which we set to two, the actual model, as well as the explained variance up to that point. For using two components, how much variance was explained by using two components? I'll run this, and we can see that it explained 72 percent of the overall variance. Now, just to see how the explained variance ratio actually looks, let's pull this out and we can see that it says, if you set n equal to two, it shows you how much of the explained variance ratio was covered with the first principle component which was about 45 percent, and how much was done by the second component which was about 27 percent, and the first one should always have more than the second which always had more than a third. Our first principal component should be the component that explains the most variance. We will have there for each of our number components, the amount of variance explained. That is covered. Our next step is going to be to find the feature importances. The first thing that we're going to do here is we're going to, and let's add this on over here. Set some weights, and the idea of the weights is that we have the breakdown of each of our principal components, but we want to add more weight to the more important principle components. The first one should be more important than the second one, and so on and so forth. What I'm doing here is I'm taking this explained variance ratio that we output here, and then we're just setting it if we're working with two components or setting it as a proportion of one, so we're saying 44 percent and 28 percent, so we're adding those up, and we're saying, out of one, what proportion is 44 and what proportion is 28? Just to look at what that means, you see we take that original amounts with 45 and 27, and we just divide it by the total of 45 plus 27. That we see that the weights are 62 for that first component and 38 for that second component. We're going to weigh our components according to how important these different principal components are. This will become clear in just a second. The next thing that we're going to see is this pca.components. What was important here for the PCA components is this is going to be the breakdown of how each one of the components is actually comprised. Let's first strip away everything besides pca.component. We can see here that we have for the first components, how each one of the different features that we had. We have six different features, how they each created a linear combination to come up with our first components, and t hen the linear combination that came up with our second component. Again, the idea is the larger these absolute values are, the more they contributed to each component and the more important that feature is. What we had here before is we took the absolute value, because we don't care about whether it's positive or negative. We just care about how much it affected that principal component. Then we're weighting it according to these weights. If you recall, the weights are going to be how important each one the principal components are. This first one is going to be multiplied by 0.62, and this second one is going to be multiplied by 0.38, so that we don't put on too much weight. We see here that we use 70 percent of whatever feature this is; this is the fifth feature. Then we use 70 percent here in the second feature. The second PCA for a different feature, we want to ensure that these do not get equal weights. This should get a higher weight than this one since this is part of the first principal component. That's why we multiply it by the weights, and then we can see what the overall contribution is. Let's just copy and paste that. We can see the overall contribution for each one of the different components, and then we're going to take the sum axis equals 0. So that we can see now that we've weighted each one of them, how much each one of these different features with their weightings were able to comprise these principal components that we have. We see here that's whatever feature it is. The fifth feature was the most important in the first two components if you add up the weights of the first two components. We're then going to divide that value down here. We have the absolute features values. We're going to divide that by the total sum of these values, to ensure that each one of these values as a proportion up to one. So that we can see, again, these each represent how much weight each one of our original features played in coming up with our two principal components. We're going to normalize that over one to see the proportion of one, of each one of these features; how much they comprise, how much did they contribute to coming up with these principal components. That's going to be the values that we have here. Then we are going to have a data frame that has the number of components, and then it's going to have each one of the different columns so that we can line that up with each one of these values. Then we're going to have for each one of those different values, what is the aligned column that I went with, and that's going to be our values here. I'm going to run this. The first thing that outputs is the number of explained variance for each one of our different principal components. You see the first one covered 45 percent, the first two covered 72, then 83, 92, and 98. We see once we get to five, we've covered 98 percent of our overall variance. We're then going to concatenate fewer call. Let's look actually at this feature waitlist that we created. This is going to be a bunch of data frames. Let's just look at the first one. We see this is going to be for a number of components equals to 1. How much each one of these different features contributed to that principle components? We set this equal to one. We can see for the first two how much it contributed to each one of the difference principle components. We're going to concatenate all these different data frames together so that we have one long data frame. Then we're going to pivot that and set the index equal to this n so that we don't have multiple ones, twos, but we'll sum up all of the ns. We are also going to set our columns equal to the different features. Then we can just have our values as the values. Now we have this data frame that we have here, where we see when the number of features is equal to 1, the contribution of each one of these different features; one the number of features, one the number of components is equal to 2, the contribution of each of the features, and so on and so forth. Now we're going to plot the overall variance just using a bar plot. This is plotting what we had up here, that pca_df, which is just that overall variance. We just set our x-label, our y-label, and our title. We see that how much of the overall variance was explained once we add on each one of these different principal components. Then finally, we have plotting the features_df, and we're going to see as we have each one of the different number of dimensions that we're working with. How much does each one of the different features contribute to all of our principle components. We see here for detergents paper, at first, it explained most of the variance that was the most important feature and tends to balance out as we add on that number of components. Now that closes out our section here on question four, showing you how to see use PCA, see the explained overall variance, as well as getting a hint at the actual feature importances as we create each one of our different principal components. In the next section, we will discuss how we can actually use grid search to fine tune our PCA model, especially when working with kernels. I'll see you there.