For this lab we're going to be exploring and creating an e-commerce analytics pipeline with a new tool to a lot of you called Cloud Dataprep. I've been a data analyst for 10 years and I was extremely happy to see this tool come out, largely for data preparation and creating these pipelines without having to know Java or Python. So, let's dive right in. The two tools we are going to be using here for this lab are going to be BigQuery. So, go ahead and open up your BigQuery instance if you don't have that open already, bring that open. First thing I'll ask you to do is the goal of this is to create a cleansed version of, we're going to start with the all sessions raw data. We're going to clean it up a bit, we're going to add some new columns that's new fields, and ultimately they create a repeatable scheduled data pipeline that's going to run without even you touching it. So with new data comes into that table, it will automatically kick off and create ultimately output to your own dataset here. Now, we can't obviously have your own dataset here if we don't have any datasets. So without further ado, go ahead and create a new dataset, want to call it e-commerce, and don't worry about populating with any tables, that is the job of our Dataprep job. So you're all done for BigQuery, make sure that's there or else Dataprep will be wondering where on Earth you wanted to output your data to. So, we're going to do four things in this lab. We're going to explore the sample of our data, we are going to clean it, we're going to enrich it and ultimately at the end of day back inside of BigQuery we want to populate this e-commerce dataset with a table of revenue data. So Dataprep, if you open up the products and services menu here on navigation, all the way at the bottom Dataprep is this icon here. We'll open that up, and the first time you open it you're going to be accepting terms for the partnership that is Cloud Dataprep and it's actually using the Trifacta product. So, Trifacta has been around for a while and it is now part of Google Cloud Platform. Behind the scenes with the Trifacta front end and the UI that is their Cloud Dataprep our transformer view which you're going to get a lot of experience working with to explore a dataset here, it's great, but it actually kicks off a cloud data flow job which is typically written in Python or Java and you'll get a lot of experience in that if you take our data engineering courses where you can manually specify those pipelines, those transformations that you want to do. The great thing about Cloud Dataprep is if you're like me and you don't want to worry about writing Python transformations then you can actually just do that directly within the Web UI which is great. Keep allowing these permissions, and you are going to have to do this each time you want to use it in your labs because your labs will create a new project every time but once you get your own GCP, Google Cloud Platform account, you won't need to do this every time because this is just a one time setup. So, Dataprep is asking for a staging bucket, just go ahead and the default is fine, and ultimately, you want to get to the page where you can see the homepage for Cloud Dataprep and that is going to be flows. Flow is a fancy name for the pipeline that you're going to be creating, if you dismiss this tooltip here you'll see that there is already one example flow that's already in there, we're going to create our own brand new one. So, create a flow, we're going to call this flow the, let's call it Ecommerce Analytics Pipeline. This is going to be revenue reporting table for Apparel or actually if you're going to be an apparel clothing manager for this segment of your e-commerce data and you really only want to care about all the data that has transactions, transaction revenue associated with the apparel product category, sets the flow we're going to create here. Once you've got that created we actually need to ingest some data. So, add datasets and here's the time where while we import the datasets, go over to BigQuery, you'll notice you have your e-commerce table here. Now there's no raw data found in here, so inside of the lab you're provided with a SQL query to copy just a subset of that data. So, I'm going to bring that into a new query inside of BigQuery executing just some standard SQL here. If the syntax is new to you, you'll be learning how to actually create tables and views in the next course, when the really cool parts is you can do it both within the Web UI, but recently introduced in the last year, you can actually do it within SQL itself. So the syntax is actually not that bad at all. But don't worry about it if you want to understand the syntax here, we'll be covering that a lot more in the next course on ingesting data. For now, all you're going to do is run this table which will take a subset, just one day's worth of session data from the public data set right here, and dump it into your e-commerce data set. So, once that is finished executing you'll get about 56,000 rows into your all sessions raw Dataprep. Now going back into Dataprep, let's go to a different target for now, go back into BigQuery, click on e-commerce, boom, you should see the new dataset that's available for you to bring into the flow, the pipeline. So, clicking on this creates the Cloud Dataprep dataset. It actually starts to pull in a preview automatically, and you can already get some interesting statistics. So 32 columns about 57,000 rows, we're going to import it and add the dataset automatically to our flow, and you'll see that there's a lot of general exploration tips, that's the general theme here was exploring the data that you can do through SQL yes, inside of BigQuery, but it's often faster and easier to load a sample of it into Cloud Dataprep to take a look at it there first.