[MUSIC] In this lecture, you will learn Spatial Categorization, which is composed of two different approaches Clustering and Classification. [SOUND] Let's start with a question for understanding spatial categorization. Can we categorize special units, in this case, census block of Seoul based on urban functionality, of course, we can. One simple solution is to present an offshore landuse map of City of Seoul. The given landuse map has ten classes, such as residential, commercial, industrial, and so on. And waterbody in blue color Another answer is to apply clustering or classification algorithms for spatial categorization. The given map is composed of six classes based on get ons and get offs, of public transit passengers for each hour. For example, a lot of get offs in the morning and a lot of get ons in the evening. would be the typical patterns of business districts which are illustrated in red color. Likewise you could produce your own map of urban functionality using clustering or classification algorithm. [SOUND] Clustering and classification follows a series of similar steps as illustrated in the figure. The real world object is full of details, and it is impossible to take everything into consideration. So we have to define your interest and make appropriate measurement of related variables. Then you will have a set of variables with measurement, which is called, pattern space. In pattern space which is still large, you should select variables for clustering or classification, and the reduced space is called feature space. In the feature space, you can delineate boundaries of each class. The divided feature space is called class space. If you refers to predefined classes for the delineation, it is called classification. If you delineate the features space only with a few mathematical parameters, it is called clustering. Confusing?, yes, it is confusing, if you did not have any expertise of clustering and classification. Now let's take a look at an example in each phase. As a spatial categorization problem, the 77 community areas of Chicago is the given real world object. The individual community would have an infinite number of variables to describe it. Assume that the objective is to categorize the 77 communities based on urban functionality. So we make a series of measurements possibly related to urban functionality. Now we have a set of variables containing measurements with respect to each community. For example, they are name of community, total population, young population, old population, average house income, the number of bus stops. Park space percentage, drop off from taxi in the morning, drop off from taxi in the evening, and many, many others. This is pattern space of community areas of Chicago. We are interested in urban functionality, so we only choose drop-offs from taxis in the morning and drop-offs in the evening, which compose only two dimensional pattern space. The figure illustrates the two dimensional pattern space in which each point represents individual community as the real world object. The feature space can be divided by set of boundaries in dotted lines Then it creates class space, a large amount of both morning and evening drop-offs represent cultural hot spot in red color. A large amount of drops in the morning and a smaller number of drops in the evening would represent business district. The opposite around would denote residential areas. A small number morning and even drop offs would characterize inactive area. Based on the predefined characterization, 77 communities were categorized into 4 classes. Namely cultural hot spot, business district, residential areas, and inactive areas. Now the figure illustrates the outcome of spatial categorization of four classes. For communities near Northside, near Westside, near Southside and Luft, which are located at the downtown are categorized into hotspot. O'Hare Airport and Midway Airport are categorized into business areas, which has a lot of drop offs in the morning definitely. And most of the Northern communities are categorized into residential, and Southern and Western communities are inactive. The next question will be what the difference between clustering and classification is? Clustering and classification are identical in terms of delineating boundaries of each class in feature space. The difference is whether predefined classes are given or not. If they're given, it is classification, also known as supervised learning. Otherwise it is clustering also known as unsupervised learning. When clustering or classification is conducted, similarity measure is the criterion to an assign an object into class. Because each object in feature space can be repsented as a vector. Similarity measure is about checking similarity of two vectors. So that simple Euclidean distance, Manhattan distance, and Mahalanobis distance are used for similar measures in clustering and classification. [SOUND] Now let's take a look at clustering algorithms in more detail. For your better understanding, I made two datasets in two dimensional feature space. The first dataset was generated by means of adding noise to two center points, (2,2) and (7,7). The second dataset has two cresent-shaped clusters which are interlocked to a certain degree. With respect to the two datasets, two popular clustering algorithms K-means and DBSCAN will be introduced and applied to the dataset. And they will show the pros and cons of each algorithm. The first clustering algorithm is K-means, which partitions datasets into K clusters, by minimizing within cluster sum of squares. The algorithm means only one parameter, the number of clusters, k. The very first step is to arbitrarily locate k center point of each cluster. In the given example k = 2, so we have 2 center points of red dot and blue dot. The first locations are also called as seed points. In the first iteration, all the data points are compared in terms of similarity measure. In this case, Euclidean distance in the feature space, and to find the closest center points and assign the data to this cluster, represented by the center point. In the figure, the red cluster denotes data points assigned to center point 1 and cyan cluster symbolizes those to center points 2. In the second iteration, new cluster points are computed by averaging the data points in each cluster. With respect to the new data center point, the same operations as the first iteration is applied. To the whole data set and new clusters are generated as shown in the figure, red crosses and cyan crosses. In the third iteration, new center points are computed and new clusters are generated with exactly same operations. The same operation continues until the clusters do not change any more. In other words, until the center points of each cluster does not move anymore. As you can compare iterations three and four, no change of cluster is recognized between the two iterations. So the algorithm finishes at the fourth iteration, and the final clusters are the outcomes of K-means algorithm. K-means algorithm is simple but powerful, but also has some limitations. The parameter K is so critical to have a good outcome and it is not suitable for finding clusters not having spherical shape. It is merely because the cost function is the Euclidean distance from the center point to each cluster. So that the algorithm clusters given data set by enlarging the clusters spherically. The figure is the outcomes of K-means for the dataset 2. It substantiates the limitations of K-means which are applicable only to clusters of spherical shapes. For such cases of irregularly shaped clusters in feaature space, there are other clustering methods. One alternative solution is DBSCAN, which stands for density-based spatial clustering of applications with noise. The algorithm is to cluster data points which are closely aggregated together. At the same time, to mark outliers, which are located in low density regions. DBSCAN requires two parameters, epsilon, the neighbor distance, and the number of minimum points within the neighbor distance. The DBSCAN does not require the number of clusters, differently from K-mean. It can find irregularly shaped clusters very well, as the example shows. DBSCAN also has some limitations. Most of all, the data set and scale should be well understood. And the two parameters should be appropriately chosen, which is a very difficult task. However, DBSCAN is one of the most popular clustering algorithms. [SOUND] Now let's take a look at an example of classification. The difference from clustering is that classification starts from predefined classes with corresponding reference datasets, also known as training data. In the figure of feature space, three classes, C1, C2, C3 are defined and reference datasets are given. The classification problem is to divide the feature space based on the predefined class and the training datasets. One simple classification algorithm is MDM which stands for minimum distance to mean. MDM simply draws the class boundaries based on Thiessen polygon with respect to mean of each class. Thiessen polygon, also known as a Voronoi diagram was taught in proximity analysis and accessibility analysis. After classification is completed, in other words, the class boundaries in the feature space are delineated. When a new data point comes, it's class can be determined by checking where the point is included. The new data in the figure belongs to class two in purple color. Another popular classification algorithm is a decision tree. While MDM uses Thiessen polygon to build class boundaries, the decision tree uses rectangular boundaries to divide feature space with respect to given classes and corresponding reference dataset. You are looking at an example of decision tree classification. One good aspect of decision trees is that it can be formulated in terms of a set of conditional statement. Consequently, it is easy to understand and straight forward to simply extract rule sets of classification. As the name implies, decision tree can be represented by a tree structures. Internal nodes A, B, C represent conditional statements and the leaf nodes C1, C2, C3, and C4 denote the classes. For example, the root set for class 1 is that x is smaller than 10.6, and y is smaller than 8.7. In such a way, we can clearly recognize semantics of each class, which is impossible in other classification methods. When a new data is given the tree structure is used as a classifier which can check which class contains the new data point. The example is 13, 9, and with applying the series of conditional statements, it can simply figure out the new data belongs to class 2. You are looking at an example spatial classification of political preference in Seoul, Korea, using decision tree. For producing the classification map, my research team tested more than 100 variables to relate to demographic, economic, environmental factors with respect to administrative districts of Seoul. A decision tree was built for the given input variables and target variables of four classes which are political preference levels, classified by the percentage of vote casted to conservative parties over 75%, 50 to 75%, 25 to 50, and less than 25%. The tree was build and used for making the classification map in the previous slide. As you can see decision tree can show the semantics of classification rules. For example, it shows that districts with more elderly people have more conservative voting patterns. And district with more young males of early 30s are more progressive. [SOUND] In this lecture, you studied clustering and classification, they can be applied to spatial unit and categorized them. Both methods are basically about how to delineate class boundaries in feature space. Classification does it with predefined classes and corresponding reference dataset. While clustering delineates the boundaries without any prior information except for a few parameters. You studies came in the DBSCAN for clustering, and MDM and decision tree for classification. There are many other variations of clustering and classification algorithms. All right, this is the end of this lecture, see you all in the next lecture.