With the idea of using the average distances of all the points within their respective clusters, how do we go about actually finding our stopping point? Let's say we're at this stage, and we have five clusters as we see here, and each one of those clusters are color coded as we move forward. At this stage, we can say that the average cluster distances for each one of our clusters which we have marked here with the same colors that we just saw in that two-dimensional plot. We have each of the average distances, and with that we have our gray dotted line which marks a point where we are going to stop once all of these average distances are above that line. In the next iteration, we find that light purple and magenta clusters are going to be merged. Therefore, that average cluster distance for that particular cluster should go ahead and increase. We can visualize this change in that average cluster distance as followed. For that new combined cluster, we now have this average cluster distance that we see that is higher than the previous two. Before we had the light purple and the magenta, we merge those to that higher version of magenta, and we see that we have a higher average cluster distance. Now, we only have four remaining clusters and as a whole they're a bit closer to that limit set to that gray line. In the next step, we can have that purple cluster is going to merge with the teal cluster in that top right corner. The new cluster formed, combining that teal and purple is now above that threshold. Now we don't stop at this point though. We are only going to stop once the minimum is above that threshold. The minimum average cluster distance is still not above that threshold, we still have the pink and magenta below that threshold. Now, in this next step once we move to two clusters, magenta cluster and the pink cluster merge together to create this new pink cluster. Finally, once we merge these two, all the cluster distances are above this threshold. They are big enough to therefore claim that the algorithm has finally converged. Now we mentioned earlier that we would want to merge clusters at some point that are closest to one another, but that idea of which cluster is closer is a bit of ambiguous concepts, especially when they're going to be multiple points belonging to each one of these different clusters. Now there are several methods to measure that distance between these clusters, and these different methods are called the different linkage types. The first example that we have here is single linkage, and that's going to be the minimum pairwise distance between our different clusters. Given that we have our different clusters that we have here on our data set, it's going to be the distance between the two closest points, say one from the teal cluster and one from magenta, and we can see the blue lines that connect each one of these according to which is going to be the minimum distance between a certain point in the magenta and a certain point in the teal. We take that distance between those specific points, and declare that will be the distance between those two clusters. Then we try to find four all these pairwise linkages, which one is the minimum? Then we would combine those together as we move up the hierarchy. We will talk through many different type of linkages. A pro to the single linkage or the minimum pairwise distance between clusters, is that it can help in ensuring a clear separation of our clusters that had any points within certain distances of one another as clear boundaries, but a con of this single linkage will be that it won't be able to separate out cleanly if there's some noise between the two different clusters. It'll be very easy to be skewed by certain outliers falling close to certain clusters. Now, another linkage type is going to be called the complete linkage. With complete linkage, instead of taking the minimum distance given the points within each cluster, we would take the maximum value. Taking the furthest distance from each cluster, and from those maximum distances decide which one is the smallest and then we can move up that hierarchy to reducing here from four clusters down to three. Now, a pro of this method is that it will do a much better job of separating out the clusters if there's a bit of noise or overlapping points of the two clusters, unlike with the single linkage. But a con of this is that it can tend to break apart larger existing clusters, dependent on where that maximum distance of those different points may end up lying. Alternatively, we can also take the average of all the points for a given cluster, and use those averages or those cluster centroids as we've been introduced to, to determine the distance between our different clusters. Now, the pros and cons of using the average can be seen as an average between the pros and cons of using the single and complete linkage, and that it may also break up those larger clusters and also may be a bit drawn towards a noise but also do a better job than either the single linkage or the maximum linkage in regards to the cons of each. Then finally we have the ward linkage. The ward linkage is going to compute the inertia. If you recall the inertia is going to be the distance square between each one of our different points and their centroids, and picks the pair that's going to ultimately minimize that inertia value. It's trying to minimize that sum of squares of the distances to the cluster centroids. In that sense you can think of it as something similar to k-means in trying to come up with a new combining of the different clusters. Again, the pros and cons of ward will be similar to the average and that they will balance out both the pros and cons of the min and max linkage. Now, in order to do this in practice, to do this in Python, it'll be very similar steps to what we've seen so far. We will start by importing our class here called agglomerative clustering. We're then going to create an instance of the class, so we say agg equal to AgglomerativeClustering. With our different hyperparameters, we can choose the number of clusters here, we set the number of clusters is equal to three so that it'll keep building up until we get to three clusters. We then have the option to choose our distant metric, and here you see we chose euclidean, affinity equals the euclidean, we use the euclidean distance. We can also define what our linkage will be. Going through the different linkages that we just discussed are available, we can choose which one we'd like to use for our current clustering algorithm. Then as before we would fit the instance on the data and use that to predict clusters for our new data. Now let's recap what we went over here in this section. In this section we introduced the hierarchical agglomerative clustering method, and how we can use it to slowly build up to larger and larger clusters. This method becomes very useful in business practice, and you may want to also see the subgroups that built up to these larger groupings. We then discussed stopping conditions, and how you may either have a predetermined amount of groups in mind or a predetermined amount of clusters in mind, or you can say to continue up until you reach a threshold of minimum average of our cluster distances. Finally we went over different linkage types, including single linkage using the closest points to determine distance between clusters, complete linkage using the furthest points to determine the distance between clusters, average linkage and ward linkage, which finds the combined clusters that most reduce the amount of inertia. That closes this video on hierarchical agglomerative clustering. In the next video we're going to dive into our next clustering algorithm, DBscan. I'll see you there.