Loading...

3.1 Apache Spark

Course video 15 of 35

The third module “Spark” focuses on the operations and characteristics of Spark, which is currently the most popular big data technology in the world. The lecture first covers the differences in data analysis characteristics of Spark and Hadoop, then goes into the features of Spark big data processing based on the RDD (Resilient Distributed Datasets), Spark Core, Spark SQL, Spark Streaming, MLlib (Machine Learning Library), and GraphX core units. Details of the features of Spark DAG (Directed Acyclic Graph) stages and pipeline processes that are formed based on Spark transformations and actions are explained. Especially, the definition and advantages of lazy transformations and DAG operations are described along with the characteristics of Spark variables and serialization. In addition, the process of Spark cluster operations based on Mesos, Standalone, and YARN are introduced.

Courseraについて

コース、専門分野と世界最高の大学や教育機関からのトップ講師が教えるオンライン学位。

Community
Join a community of 40 million learners from around the world
Certificate
Earn a skill-based course certificate to apply your knowledge
Career
Gain confidence in your skills and further your career