3.1 Apache Spark

Course video 15 of 35

The third module “Spark” focuses on the operations and characteristics of Spark, which is currently the most popular big data technology in the world. The lecture first covers the differences in data analysis characteristics of Spark and Hadoop, then goes into the features of Spark big data processing based on the RDD (Resilient Distributed Datasets), Spark Core, Spark SQL, Spark Streaming, MLlib (Machine Learning Library), and GraphX core units. Details of the features of Spark DAG (Directed Acyclic Graph) stages and pipeline processes that are formed based on Spark transformations and actions are explained. Especially, the definition and advantages of lazy transformations and DAG operations are described along with the characteristics of Spark variables and serialization. In addition, the process of Spark cluster operations based on Mesos, Standalone, and YARN are introduced.



Join a community of 40 million learners from around the world
Earn a skill-based course certificate to apply your knowledge
Gain confidence in your skills and further your career