So as promised, let's start to visualize how DB scan actually works. And as we have in our past clustering algorithms, we're going to start with this two dimensional data set, and we're going to come up with clusters, depending on the visits and the recency and how far away each point is from one another. So we start at a random point, here we have this point in pink, and then we look at the radius epsilon around that point, and we'll have to define that epsilon and here we define as 1.75. And we look, we create that 1.75 epsilon and we look around, and we see, is there enough points, given our n_clu within that circle, to start a cluster. And we see that there are four points, again, we include that point itself, even though, even with that point, we get up to five, so we now have our first cluster. So every point within that epsilon is going to be part of our first cluster. And then we process each new point in the same way, so we move on to our next point here, and anything within that radius within that epsilon radius, gets included as part of that cluster. And we keep moving along. And then here we see that this point while it is part of the cluster because it's near one of the core points as we looped through we saw that the point to the right of this that it is part of within that epsilon radius, was a core point with four points. This one is not a core point, but it is density reachable. So it will be part of our cluster, but we will highlight that this particular point is going to be border point, a density reachable point, and not one of our core points, so we'll leave that as part of a lighter pink. And we keep going down this chain, adding on points according to those that fall within epsilon, keep running through and we see these are all core points because they all have at least four points including themselves in there. And we move along and then this point only has three, so this one again is going to be a border point but it is near one of the core points so it will count as part of the cluster still. We see we highlight that in light pink, and we see we can keep moving along, and eventually we have all of our points within the cluster, we stopped the circle on all the points and then if there are no neighbors left, we will randomly try a new unvisited point, to potentially start a brand new cluster. And when we do that, here we start with the blue we need to check is this going to be a core point once again. So we check again within epsilon of this new random point that we sort out, we see that it is a core point, bow we have started our new cluster. Now, this point again is going to be that density-reachable point, but it will still be part of the cluster because it's near another point that is a core point. And we can continue to move along to build out our cluster here, and you see again we have a density-reachable point, we've had a couple so far, but all those are near core points, so they still are going to be part of our cluster. And then we see here that we have with n_clu equal to 4, we only have three within this cluster, so this is going to be a density reachable point, but not a core point. And then when we move over to this point over here, we see that the only one within that radius is going to be that density reachable point, so there's no core points within this radius. So if there's no points within this radius that are not core points, then this becomes a noise point it becomes an outlier. So this isn't part of either of our two clusters, and it's labeled as an outlier point which is why we have marked it here in gray. Now, I want you to take a moment, and given that DBSCAN method that we just walked through, notice which points tended to be the core points, as we have them labeled in a darker hue. Which ones were those density reachable points which are still part of our cluster, but don't have the number of points that make it a core point, given our n_clu, and then which point we have labeled as an outlier. Now that we understand how the DBSCAN algorithm works, let's discuss some strengths and weaknesses of working with the DBSCAN algorithm. So as we saw, with the DBSCAN algorithm, we'll not need to specify the number of clusters as DBSCAN will automatically determine the clusters, dependent on how close points are from one another. It also allows for noise and will not automatically determine the outliers are part of a particular cluster. It'll also do a strong job of handling arbitrary shapes as it's going to be searching out points that are within epsilon distance of one another, and will stop whenever a gap occurs no matter what that boundary shape between the clusters are. Now some weaknesses, it's going to require two parameters, which means we need to search over more possible values to find that optimal solution. Also those hyper parameters can be very difficult to fine tune in higher dimensional space. And then finally will not do well with clusters of different density. So even if we have two clear groups, if for one group the points are about five units away from one another, and the other is one unit away depending on our distance metric. Depending on that distance between our two clusters that are, on average five units away or one unit away, it may be difficult to determine the differentiation between those two clusters. Now, let's walk through how the DBSCAN algorithm can actually be used using Python, so first things first we import the class containing our clustering method. So from skleam.cluster, we import DBSCAN, we then create an instance of that class, and pass in the necessary hyper parameters. Here we're setting eps=3, and the min_samples=2, so that's that n_clu that we've been talking of an epsilon is the epsilon we've been talking of. That distance from every single point in order to include it as a core point, or within the cluster. We're then going to fit that instance on the data, so just calling db.fit. And then we can't call db.predict because of the way that the algorithm actually works, if you recall it's the finding the points iteratively by scanning through each one of the different points within that data set, so it's just creating clusters within that fitted data set. You can't call predict with the DBSCAN. If you want it to fit on a larger data set, then you just include it in that fit, and then you can come up with the different clusters. So we get our db.labels, and just to note for those labels, we're going to have Class Zero class one, and if there's going to be an outlier, any outlier as we saw can happen with DBSCAN, will be labeled -1. Now let's recap what we learned here in this section. In this section we discuss the DBSCAN algorithm, and how we'll come up with its own clusters dependent on which points are within a certain distance of the other points. We then discuss the inputs and their importance, specially that of the epsilon and n_clu chosen, as well as the outputs, and understanding the difference between a core point, a density reachable point, and just outliers, or noise. And finally, we discussed some of the algorithm strengths and weaknesses, such as it being able to better determine clusters, or arbitrary shapes, but perhaps having difficulty determining clusters, that may have different densities. Now, this closes out our discussion on DBSCAN, and in the next video, we'll introduce our final clustering algorithm, the mean shift clustering. All right, I'll see you there.