Welcome back for part five of our notebook here. Here we're going to introduce kernel PCA, or PCA working with a kernel where we're going to use what we discussed in the lecture. And that we can come up with a non linear combination rather than the linear PCA to come up with a way to say where the highest variance is by mapping up to higher dimensions to get that curvature in those lower dimensions. Now, we want to know, choosing here that our kernel is equal to RBF. We can also search through different kernels and I suggest you looking at the documentation as well. But we can also search through when we're working with RBF using different gammas and that'll tell you essentially how complex that boundary is going to be or how curvy your line that you can project on to will actually be. So we're going to search through different gammas. And we're going to use grid search. And when we use grid search, what we're trying to do is find the best model and when we do this with supervised learning, this is clear we can do this with using a scoring methods such as mean squared error, or working with the accuracy or whatever other classification score you want to use and optimize on that score. Now when we're using unsupervised learning, it's not quite as clear how we can end up scoring which one of these different models performs better. But we do need to come up with some type of scoring option in order to decide which gamma or if we want to search through different kernels, which kernel work the best. So what we're going to do here is we're going to introduce a custom scoring method. So you'll see here that we defined a score. And we'll walk through what that score is. But essentially what we're going to do is take a model, fit a PCA, fit that PCA model to our data, and then take the inverse of that, and then see how far away the inverse of that PCA model is from our original data. And the lower that value is, the better we did. So let's walk through that here. So first we're going to import the kernelPCA rather than just PCA. We're going to import GridSearchCV as we'll be using that in order to find the optimal hyper parameters for our kernel PCA. And then you'll see in just a second how we're going to incorporate mean squared error in regards to coming up with the best version of our kernel PCA. So first thing that we're going to do is define a score. So we're going to pass into that score the PCA model, as well as our x and there's going to be no y here just going to be that x, right we're using unsupervised data. There's no label that we're attributing to this. All we're doing here with this try and accept is just we want to ensure that we are working with a NumPy array rather than working with a Pandas data frame. So if this x is equal to a pandas data frame, we call that values and we're working with the array. If it's already an array, then they'll just say x val to that array. We're then going to call our PCA model that we pass into the score. And we're going to call it on that X val. And we fit transform our data to get our new version with however many components we're passing through. One component, two components, so on, as well as whatever kernel we're using and whatever gamma we're using. Specific to what this PCA model is. We're then going to take the output of that and pass it into this PCA.inverse transform function to get the inverse, which should undo what we did, but it can't perfectly undo because we lost some information as we did that original transformation, as we did that original dimensionality reduction. So we'll take the inverse, and that will be our new data end. And then, what we're going to do is take the original data that we had, and see how far off that is from our inverse transform that we just did. And in order to do that, we'll just take the mean squared error. Now when we do a square, we want to get the highest value possible. When we do mean squared error, obviously we want to minimize our mean squared error. So we're just going to multiply it by -1 so that we can optimize by getting the highest value. And that's going to be our scoring function. From there it should be as simple as any other grid search that we've worked with in the past. You're going to set your parameter grid, which is going to be gamma and we'll loop through different gamma values. It's going to be this dictionary and the number of components and it will loop through different numbers of components. Now I'll let you know generally speaking, the higher the number of components, the better this transform, inverse transform will work, but this will allow us to hone in on the right level of gamma. We're then going to do grid search CV. We're going to say that we want to pass in the kernel PCA. And the things that we don't want to search over but want to keep the same through every single loop is going to be that the kernel is equal to RBF. And we want it to fit the inverse transform. If we don't call this, when we call the PCA, then we won't have the option to call this inverse transform that we have called up here during our scoring function. So we say Fit_inverse_transform=True. We can then pass in our param grid that we defined up here. And then we can pass in the score that we just created. We say n_jobs=-1 just to say we want to paralyze as much as possible. And then using this kernelPCA that we're defining here, we can call kernelPCA.fit on the data and get our best estimator to see which one of these gammas performed the best. So run that, and that will take just a second to run. So I'm going to pause the video, there it is, never mind. And we see here that we have for our gamma value 0.5 was the best option in regards to that transform to inverse transform and we see that the number of components is equal to 4, which is the max value, which is what I said. Usually when you're working with looping through the number of components, the max value will be the one chosen. But now we can see that we should probably use that gamma equals 0.5 when choosing our gamma for our criminal PCA. Now for Part Six, we're going to show you how you can use PCA built into your modeling pipeline in order to perhaps use it to make your logistic regression work better on the data that you have. So we're going to be loading in this very large data set, which is the human activity recognition using smartphones. We've seen this before, it has tons of different columns. We can look at the shape here and see that it is 10,299 rows and 562 different columns. So we're going to try and reduce that number of columns. So what we're going to do is we're going to first import the different libraries needed, our pipeline, standard scaler, stratified shuffle split to keep that same ratio of each one of our different outcome values. We're now using logistic regression and we can pull in our accuracy score since we're doing a classification problem here. x is going to be all values except for activity. y is going to be the activity. And then we're going to initiate our stratified shuffle split and we'll call this in just a bit when we want to get our average score. Now this get average score is going to just be a function that does all the steps in the pipeline to standard scaling, PCA and then logistic regression. And all we're going to change at each one of the steps is the number of components. So we set this pipe equal to this list and we pass it pass it into our pipeline as we've done before. We have our scores which are just blank. So we initiated our pipeline but haven't fit anything yet. We have our scores equal to that blank. We're then using that SSS that we initiated here, that stratified shuffle split, and we're going to get five different splits, since we set the number of splits equal to five. And for each of those, we'll get a new x train and a new x test as well as the new y train and the new y test. And we can call pipe, that being the pipeline we created here, .fit on our x train and y train. And then once we do that five different times throughout each time, we're also going to get the accuracy score on the test set. So once it's fit on the training set, we can see the actual score on the test set. We'll have five different scores, and then we'll output the average of those five different scores. We're going to set the number of ns from 10 up till 500. So we see our original data set was 562. We're going to see if we reduce the number of dimensions, is there a point where perhaps we don't need all of the data set or even perhaps some improvement with lower dimensions. So we're going to get our score list by running this get average score that we defined up here on each n in this option of ns that we have here. So I'll run this, and this one will actually take some time, so I'm going to pause the video here. And I'll see you in just a bit as we touch on the results from running this function. All right, I'll see you there. All right, now that has finished running and it may take a couple minutes. Let's see what the score list came out as. We run this and this should be in the same order as our ns that we have here. And we see that after a certain point once we get to the 450, 500 range, there doesn't seem to be any more improvement in adding more variables, in adding more features. And we can see this with the plot as well, just plotting out ns versus our different score lists. And we can see that it really plateaus and it's not even starting at zero here on the y axis, it's starting at 0.84. So we see that adding on all these extra dimensions doesn't really add that much extra value in regards to the logistic regression. So you could probably shrink this down to even 100 features here, or 200 features and still have a pretty high accuracy depending on what you're trying to get at. And be able to speed up the process of how long it will take to learn this model. That closes out our demo here on dimensionality reduction, and I'll see you back at lecture. Thank you.