このコースについて
39,541 最近の表示

次における4の4コース

100%オンライン

自分のスケジュールですぐに学習を始めてください。

柔軟性のある期限

スケジュールに従って期限をリセットします。

約48時間で修了

推奨:6 weeks of study, 5-8 hours/week...

英語

字幕:英語, 韓国語, アラビア語

習得するスキル

Data Clustering AlgorithmsK-Means ClusteringMachine LearningK-D Tree

次における4の4コース

100%オンライン

自分のスケジュールですぐに学習を始めてください。

柔軟性のある期限

スケジュールに従って期限をリセットします。

約48時間で修了

推奨:6 weeks of study, 5-8 hours/week...

英語

字幕:英語, 韓国語, アラビア語

シラバス - 本コースの学習内容

1
1時間で修了

Welcome

Clustering and retrieval are some of the most high-impact machine learning tools out there. Retrieval is used in almost every applications and device we interact with, like in providing a set of products related to one a shopper is currently considering, or a list of people you might want to connect with on a social media platform. Clustering can be used to aid retrieval, but is a more broadly useful tool for automatically discovering structure in data, like uncovering groups of similar patients.<p>This introduction to the course provides you with an overview of the topics we will cover and the background knowledge and resources we assume you have.

...
4件のビデオ (合計25分), 4 readings
4件のビデオ
Course overview3 分
Module-by-module topics covered8 分
Assumed background6 分
4件の学習用教材
Important Update regarding the Machine Learning Specialization10 分
Slides presented in this module10 分
Software tools you'll need for this course10 分
A big week ahead!10 分
2
4時間で修了

Nearest Neighbor Search

We start the course by considering a retrieval task of fetching a document similar to one someone is currently reading. We cast this problem as one of nearest neighbor search, which is a concept we have seen in the Foundations and Regression courses. However, here, you will take a deep dive into two critical components of the algorithms: the data representation and metric for measuring similarity between pairs of datapoints. You will examine the computational burden of the naive nearest neighbor search algorithm, and instead implement scalable alternatives using KD-trees for handling large datasets and locality sensitive hashing (LSH) for providing approximate nearest neighbors, even in high-dimensional spaces. You will explore all of these ideas on a Wikipedia dataset, comparing and contrasting the impact of the various choices you can make on the nearest neighbor results produced.

...
22件のビデオ (合計137分), 4 readings, 5 quizzes
22件のビデオ
1-NN algorithm2 分
k-NN algorithm6 分
Document representation5 分
Distance metrics: Euclidean and scaled Euclidean6 分
Writing (scaled) Euclidean distance using (weighted) inner products4 分
Distance metrics: Cosine similarity9 分
To normalize or not and other distance considerations6 分
Complexity of brute force search1 分
KD-tree representation9 分
NN search with KD-trees7 分
Complexity of NN search with KD-trees5 分
Visualizing scaling behavior of KD-trees4 分
Approximate k-NN search using KD-trees7 分
Limitations of KD-trees3 分
LSH as an alternative to KD-trees4 分
Using random lines to partition points5 分
Defining more bins3 分
Searching neighboring bins8 分
LSH in higher dimensions4 分
(OPTIONAL) Improving efficiency through multiple tables22 分
A brief recap2 分
4件の学習用教材
Slides presented in this module10 分
Choosing features and metrics for nearest neighbor search10 分
(OPTIONAL) A worked-out example for KD-trees10 分
Implementing Locality Sensitive Hashing from scratch10 分
5の練習問題
Representations and metrics12 分
Choosing features and metrics for nearest neighbor search10 分
KD-trees10 分
Locality Sensitive Hashing10 分
Implementing Locality Sensitive Hashing from scratch10 分
3
2時間で修了

Clustering with k-means

In clustering, our goal is to group the datapoints in our dataset into disjoint sets. Motivated by our document analysis case study, you will use clustering to discover thematic groups of articles by "topic". These topics are not provided in this unsupervised learning task; rather, the idea is to output such cluster labels that can be post-facto associated with known topics like "Science", "World News", etc. Even without such post-facto labels, you will examine how the clustering output can provide insights into the relationships between datapoints in the dataset. The first clustering algorithm you will implement is k-means, which is the most widely used clustering algorithm out there. To scale up k-means, you will learn about the general MapReduce framework for parallelizing and distributing computations, and then how the iterates of k-means can utilize this framework. You will show that k-means can provide an interpretable grouping of Wikipedia articles when appropriately tuned.

...
13件のビデオ (合計79分), 2 readings, 3 quizzes
13件のビデオ
An unsupervised task6 分
Hope for unsupervised learning, and some challenge cases4 分
The k-means algorithm7 分
k-means as coordinate descent6 分
Smart initialization via k-means++4 分
Assessing the quality and choosing the number of clusters9 分
Motivating MapReduce8 分
The general MapReduce abstraction5 分
MapReduce execution overview and combiners6 分
MapReduce for k-means7 分
Other applications of clustering7 分
A brief recap1 分
2件の学習用教材
Slides presented in this module10 分
Clustering text data with k-means10 分
3の練習問題
k-means18 分
Clustering text data with K-means16 分
MapReduce for k-means10 分
4
3時間で修了

Mixture Models

In k-means, observations are each hard-assigned to a single cluster, and these assignments are based just on the cluster centers, rather than also incorporating shape information. In our second module on clustering, you will perform probabilistic model-based clustering that provides (1) a more descriptive notion of a "cluster" and (2) accounts for uncertainty in assignments of datapoints to clusters via "soft assignments". You will explore and implement a broadly useful algorithm called expectation maximization (EM) for inferring these soft assignments, as well as the model parameters. To gain intuition, you will first consider a visually appealing image clustering task. You will then cluster Wikipedia articles, handling the high-dimensionality of the tf-idf document representation considered.

...
15件のビデオ (合計91分), 4 readings, 3 quizzes
15件のビデオ
Aggregating over unknown classes in an image dataset6 分
Univariate Gaussian distributions2 分
Bivariate and multivariate Gaussians7 分
Mixture of Gaussians6 分
Interpreting the mixture of Gaussian terms5 分
Scaling mixtures of Gaussians for document clustering5 分
Computing soft assignments from known cluster parameters7 分
(OPTIONAL) Responsibilities as Bayes' rule5 分
Estimating cluster parameters from known cluster assignments6 分
Estimating cluster parameters from soft assignments8 分
EM iterates in equations and pictures6 分
Convergence, initialization, and overfitting of EM9 分
Relationship to k-means3 分
A brief recap1 分
4件の学習用教材
Slides presented in this module10 分
(OPTIONAL) A worked-out example for EM10 分
Implementing EM for Gaussian mixtures10 分
Clustering text data with Gaussian mixtures10 分
3の練習問題
EM for Gaussian mixtures18 分
Implementing EM for Gaussian mixtures12 分
Clustering text data with Gaussian mixtures8 分
4.6
290件のレビューChevron Right

35%

コース終了後に新しいキャリアをスタートした

36%

コースが具体的なキャリアアップにつながった

Machine Learning: Clustering & Retrieval からの人気レビュー

by BKAug 25th 2016

excellent material! It would be nice, however, to mention some reading material, books or articles, for those interested in the details and the theories behind the concepts presented in the course.

by JMJan 17th 2017

Excellent course, well thought out lectures and problem sets. The programming assignments offer an appropriate amount of guidance that allows the students to work through the material on their own.

講師

Avatar

Emily Fox

Amazon Professor of Machine Learning
Statistics
Avatar

Carlos Guestrin

Amazon Professor of Machine Learning
Computer Science and Engineering

ワシントン大学(University of Washington)について

Founded in 1861, the University of Washington is one of the oldest state-supported institutions of higher education on the West Coast and is one of the preeminent research universities in the world....

機械学習の専門講座について

This Specialization from leading researchers at the University of Washington introduces you to the exciting, high-demand field of Machine Learning. Through a series of practical case studies, you will gain applied experience in major areas of Machine Learning including Prediction, Classification, Clustering, and Information Retrieval. You will learn to analyze large and complex datasets, create systems that adapt and improve over time, and build intelligent applications that can make predictions from data....
機械学習

よくある質問

  • 修了証に登録すると、すべてのビデオ、テスト、およびプログラミング課題(該当する場合)にアクセスできます。ピアレビュー課題は、セッションが開始してからのみ、提出およびレビューできます。購入せずにコースを検討することを選択する場合、特定の課題にアクセスすることはできません。

  • コースに登録する際、専門講座のすべてのコースにアクセスできます。コースの完了時には修了証を取得できます。電子修了証が成果のページに追加され、そこから修了証を印刷したり、LinkedInのプロフィールに追加したりできます。コースの内容の閲覧のみを希望する場合は、無料でコースを聴講できます。

さらに質問がある場合は、受講者向けヘルプセンターにアクセスしてください。