Now let's move to a more practical application where we can actually see this in practice. As we'll take this image of bell peppers, and then group together the different colors rather than working with the multitude of colors within this image, we're only going to be working with the number of colors that we create within our clusters. You'll see what I mean as we walk through this notebook. The first thing that we're going to do is read in this image. Now when we call plt.imread and we call in this image, we are actually bringing it in as a NumPy array. I'll show this in just a second. We can then use plt.imshow to actually show that image which is currently as a NumPy array, we can actually see that within our Jupiter Notebook. Then we're just calling plt.axis off because we don't want any axis, we're just plotting an image. We run this and we see our image with our different colors, in a different shades of green, red, and yellow. We call here image.shape, but quickly I want to show you what the actual image object looks like. As I mentioned, it's going to be taking this image that we just pointed out, and rather than giving the actual picture, we're actually just representing it as an array where each value is going to be how much in the red, green, and blue scale each one of these different pixels are, and that's going to be for every single pixel. So we want to see how many pixels we have, and we have 480 times 640 different pixels, and each pixel has three values that represent how much, again, red, green, and blue has. Below to just hone in on how we have this picture representation within NumPy arrays, we're going to look using of these r equals 35, g equals 95, b equals 131, and these are all values between zero and 255. We're going to call plt.imshow, and just for the specific array. So as if it's just one pixel with a certain amount of coloration. We run this and we see that this, since it's mostly blue, well, I'll put something close to blue. If I were to decrease the blue to 13 and increase green to 195, then you see this very green image. Then just so you understand a bit of how coloring works, if we were to set this all to 100, so if all the values are the same, should be somewhere gray because it's equal amounts of each, and if we set it all to zero, what do you think will happen? I'll run that here. You see that we are black. Then if they are each 255, which is the maximum value, you see that we get white. So just a quick understanding how each one of these pixels are being created using this NumPy array. What we're going to do next is to reshape our data frame so that every single pixel is going to be a row, rather than having three dimensions, we're going to make this two-dimensions. We're going to take our 480 by 640 pixels, multiply 480 times 640. Again, each row will represent a single pixel, and then the other shape will be the RGB and how much of each will be incorporated into that particular pixel. So we call reshape, we say that the first dimension is going to be the first dimension of our original pixel times the second dimension. Again, it was originally in three dimension, so we're taking those first two dimensions, multiplying those together, so that's how many rows we'll have. Then the number of columns will be RGB, will be three relating to each one of those three. Then just to see the first five values, we have each one of these rows represents a pixel, and each one of these numbers within that row represent either the red, the green, or the blue respectively. Since 480 times 640 equals 307,200, that's going to be our new shape of our new NumPy array. Now we're going to run K-means on the image that we have using eight clusters. So we're going to come up with eight groupings. We're going to every single one of these 307,000 values and find eight groups to group these together into different segments. We're then going to create a copy of that image and replace that values with their respective labels that we'll come up with these eight different clusters. Rather than the actual value that was there, we're saying for k-means, for all those 307,000 rows, where label was equal to label 1 out of each one of our unique labels, so one through eight, replace that with the actual value for that cluster center. I'm going to run this and we'll replace all those values. Just to show you quickly what that looked like, our new values, you see, they are all the same hear, 43, 15, 6, 43,15 6, and later on 236,172, 8. These represent one of the eight different clusters that we had, and we replaced those original values that we see up here with these one of the eight values that we have created using our different centroids. Now, to see what that looks like, now that we've replaced this multitude of different hues of different colors with only eight possible colors, we're going to reshape that to that original image shape. In order to actually show this as an image using plt.imshow, we have to get it back to 480 times 640 times 3. We can then call plt.imshow, again, turn off the axis. We see that we can still get a lot of our initial picture with just these eight different colors. We can see the different hues and how it differentiates between the different peppers and how we lost a bit of the granularity. But we see these clusters of the red, the white, the green, the black, and so on. So the next thing that we're going to want to do in order to take this a step further is create a function that will take in any image, as well as a number of clusters, and return the image using just the specific centroids replacing each one of those different pixels. As we just did with eight, we want to do that for any image and for any of number of k, for any number clusters. To do that, we're going to repeat the steps that we just did. We're going to say 'image flat' to the reshaped image given the first two dimensions and then three, given the RGB. We're then going to set the number of clusters equal to the k that we have defined here. We're setting random C equal to zero just to ensure that we have the same values as we look at it and you look at it back at home. We're then going to fit that to our image flat. Again, that two-dimensions, in our case the image 7,200 by 3. We're then going to create a copy as we did before, and we're going to ultimately change this copy by running a four loop through each one of our different labels. If our labels are equal to, whatever value it is, within our output, then we will replace that with that specific cluster. Again, doing the same steps as we did before, we're then going to reshape that again back to the original image shapes so that we can end up ultimately printing it out, and then we're going to output from this function, both that new image with the replace colors, as long as the inertia for that specific k-means depending on what our k was there. We've created our function that will output again that new image with the replace pixels as it was the inertia for that fitted model depending on the k that we use. We're then going to call that function for k between two and 20, counting here by two, and draw that inertia curve, as well as later on, we'll also print out many of these pictures. So we're saying k values, the k values that we will loop through are going to be two through 21 not including 21 counting by 2, and then we're going to initiate empty lists for the image list. So we can see that image list, as well as the different inertias. Well, then again, getting an output, when we call this image cluster function that we defined of both the new image with a replace pixels as well as the inertias. So we will call this function output image 2, as well as the inertia, and then append each one of these output values to that list that we initiated here. I'm going to run this and this will take just a second, and it will output for us each of these different images as well as the inertia values, and then we'll plot out these inertia values in just a second. l'll see you as soon as it stops running. So that should have taken about five minutes to run. Now, we have, from the outputs our different inertia values as well as our images which we'll get to in just a second, and we can plot our inertia values versus each one of our different numbers of clusters. We're going to call plt.plot to get the line graph on top of that, we call plt.scatter to get each of the points, and we get our x label and y label of inertia and k, and we see here that it curves down and has this smooth curve and it's hard to see an exact elbow. This is a case where maybe we can't exactly see where that elbow exists and determine using the Elbow Method. We note here and you can dive deeper into this metric of the silhouette coefficient. But what it will do is, it will tell you the difference between or the similarity between points within a cluster and other points in the cluster as compared to clusters nearby. Again, you can dive deeper, but that will be a different method of differentiating where you should choose where that number of k should be. Now, the next step that we have here is going to be that we are going to plot each one of the images to see given the images that we have, how each one plots with the different number of colors. Again, we're only going to use the number of colors that we have within the cluster, so we're going to run through our values of counting by two between two and 20. For the range of the length of those values, so for 10 different subplots, we're going to plot a five rows by two columns. So we're going to have a subplot that, it'll be a grid of 10 different axis, where each one will have a different image, and one at a time, we will show that image given the k values that we are using, and then we will title that and then also turn off the axis. We can see, as we increase the number of colors, how much of the image we are able to actually discern, given the number of colors risen. So here at the bottom, when we see that we're using 20 different colors, so we have 20 centroids replacing their original values. We see that we can actually pretty clearly see each one of our different papers and really discern the original photo well. Just to give you an idea of how many colors that were originally, we can run np.unique, and that was on the image, flat, and we'll say axis equals zero, and let's see the length here. You see that originally, there was 98,452 unique colors to make up that picture, and we can see how well we can represent that with just 20 colors here. So we can see how well we were able to group those 98,000 different colors into 20 colors on their own. That closes out our notebook here in regards to k-means clustering, and I look forward to seeing you back at lecture. All right, I'll see you there.