Data pipelines typically fall under one of the Extra-Load, Extract-Load-Transform or Extract-Transform-Load paradigms. This course describes which paradigm should be used and when for batch data. Furthermore, this course covers several technologies on Google Cloud for data transformation including BigQuery, executing Spark on Dataproc, pipeline graphs in Cloud Data Fusion and serverless data processing with Dataflow. Learners will get hands-on experience building data pipeline components on Google Cloud using Qwiklabs....



A great course to help understand the various wonderful options Google Cloud has to offer to move on-premise Hadoop workload to Google Cloud Platform to leverage scalability of clusters.


Great course learning what it is the big advantages of using GCP for data given they have big implementations and with better performance of what it is today in on premises scenarios


by Abhishek D






by Jon C


Enjoyed the course and the instructors. There is a lot of ground to cover for two weeks worth of content. Some minor improvements: 1. A number of the videos mention linking to content (template github as an example), but then failed to include a link in the resources section. 2. The labs are more of a code review than practice in creating actual pipelines, and ask questions without providing an answer. It may prove helpful for learners to have an opportunity to develop elements of the lab code as well as having answers to the review questions so that the lab user knows whether or not their answer to the questions posed were in fact correct.

by Franz H


Again one of the mostly presentation classes - a filmed version of a feature desription of Google products. Some useful demos included, but both the quizzes and the labs are without even the most elementary demands - so it is really hard to learn anything. Very easy to collect another certificate, but that's about it. It shows that you successfully walked around the car and can name some of its parts, but you will not learn to drive in this class, unless you use the generously provided labtime for studies of your own.

by Diego T B


This course only scrathes the surface of Batch products of GCP. On the Dataproc lab, which in my opinion is the most important for data engineers working with GCP, you have very little time to do so much work, that you have to speed run it and learn nothing at all. The Week 2 course could be split up into another week.

by Alin P


The lab assignments could be more involved than copy pasting some commands, which is useful, but easy to forget. The videos are quite long. There should be more quizzes that tested the knowledge in the videos more thoroughly, i.e. keep the rapid feedback of the quizzes, but rotate the answers.

by Justin A B


Would like the labs to center around building common ETL requirements in the Dataflow portions of the labs, example joining, data transforms, pivots, etc. Most ETL developers are familiar with these patterns and would be interested in mapping those with how Dataflow would solve for.

by Brian S


Many of the labs didn't really provide opportunities for real hands on learning, but instead seemed to be button clicking experiences. Improvements could be made by not just having students run the files, but also make updates to them as well

by Benjamin T


Course needs many improvement: Include better explanations, walk throughs through the very particular apache beam syntax and logic as well as give hints and time in qwiklabs for experimentation particularly for Data Flow

by Sean W


the first part was great, however there were many times when cloud data flow was covered.. streaming topics were discussed. Why in this course? I know that cloud data flow can do both, but don't mix the material..

by Sreenu A


It covered mostly a basic stuff. Data Engineers need in depth knowledge. Qwiklabs need to modify as real time scenarios instead of working on gcloud commands.

by Aaron H


this course is OK, the information is good but the labs are messed up 90% of the time, and like always to much sales pitch

by Kota M


It is helpful as a first step, but it does not make learners who can develop architecture on the google cloud.

by Juan J T M


There is very good material, but it should be a thorough examination of the different tools and its code

by Laurence M S


This course was extremely confusing. I will most likely need to go through it again.

by Mariia Z


Good materials, but poor quality of the labs

by Marco A d A C


I expected more details, more deepness

by Y C


Could elaborate more on dataflow

by Hossain A


Got an overview of GCP pipeline

by Yogesh D


The course at a very high level, students with no prior exposure to HDFS, SPARK and Apache beam will have hard time understanding any concepts. Labs are not productive enough, you just follow instructions, labs should be more challenging

by Lourdes R


I think the examples in the lab could be more interesting with examples using data set closer to business reality.

Also, some tutorials contain wrong steps and references to old tools

by Lisanul D


DataFlow part is really bad, no explanation in the lab excercises. Anyone could run them blindly and go through them. No way to verify if the lab understanding was good.

by Marcos P


Some slides are missing in the resources, doing difficult to follow the video and take note. I would prefer to have material to read instead to follow videos

by Vinod K


The labs had many errors. I spent most of the time solving errors and getting help from support team.