So, the Virginica and Versicolor got merged together into one cluster,

and Setosa is off to the side,

and then I went with four,

and it split this area into cluster three, so thanks.

There was a dividing line here and a dividing line there.

So, K-Means works by leveraging similarities among examples,

data points in a multidimensional data space,

and this distance between them,

and distance has to follow certain rules,

and gave me any negative distances.

The distance from point one to two is the same as the distance from point two to one.

The symmetry property and the distance between an initial point

in the furthest point is greater than or equal to the distance

from the initial point to a second point,

than to the farthest point.

That is the triangle called the triangle inequality,

and in short, it means aren't any shortcuts in two-dimensional space,

so there's no wormholes are falling off space, or anything like that.

There's number of ways to measure distances.

We remember from geometry the Euclidean Distance is the square root

of the squares of the differences in the coordinates.

There's Manhattan Distance, which is up and over,

up and over and over and over, and then there's,

I hadn't heard this one before the Chebyshev Distance,

and like this one.

Who's played chess? So, you know how king moves?

So, the Chebyshev Distance is,

if the King is here,

so the distance to the next square and all the squares around it as one,

but if the King has to move two.

By the way, king can move and so forth,

and outward from the square that the king is on.

It's called the Chebyshev Distance,

and it's never heard of it before.

That was interesting.

The K-Means assumes that the data has clusters,

that there is some structure in there,

and the clusters are made up of similar examples with some starting example,

and it just randomly picks a sample,

and that sample's called the prototype.

Or you can think about it as the centroid also,

center of mass, if you will,

in the center of the cluster.

The clusters have a roughly spherical shape,

so they have to be perfectly spherical.

Certainly, not true for a real-world data,

and K-Means uses Euclidean Distance.

You can work with ordinal numbers: one, two, three.

I can work with binary numbers in the classification, spam classification problem.

Things to be aware of, because it uses Euclidean distances,

the numbers need to have roughly the same scale,

otherwise features with very large values

dominate and potentially end up throwing off your results.

One solution to that problem is to transform your data

before K-Means by statistically standardizing all of

the features and transforming them into

new features by this dimensionality reduction process,

such as principal component analysis.