So back with a little bit more details about the WHERE clause. This is more of a recap of what we went through in great detail before, but why can't we use is_school is equal to yes for the filter here? And if you remember, that's because the alias is not known at that filtering time. So that's where we're actually having to specify the entire field. Operatesschools170 code or cd is equal to yes. And be very careful that your data is in capital Y format not lower y or Y-E-S, or something like that as well. That's a very, very common trip-up as well. Great. So now, we have just schools in those top ten revenue results. Moving on from kind of simple formatting functions as you saw before. Let's talk about some aggregation functions that can do some cool mathematical operations like total revenue across all of 2015. So aggregations can perform calculations over entire sets of values. That is to say if you had 1.5 million charities in your tables and you just wanted to collapse or sum together or count or get a max in all of those rows, you can do that through these aggregation functions. So for here, we're invoking five of them. We're saying, sum the total revenue. But then average it and count the number of employers and then count the distinct number of employers, much like we did in that exploration lab and then give us the maximum amount of employees that one of these 2015 filers has and if we have that information here. So the total revenue for all US nonprofits for 2015 is that number featured there and the average revenue is there. And the number of nonprofits and the number of distinct nonprofits and the one who had the most employees, almost 800,000. Now if that average had too many points after the decimal, what you could do again is round and one of the interesting point here is you could actually nest functions inside of other functions. So SQL is going to go ahead and produce the average of that total revenue figure. And then after it's done that, it's going to look outside and say, okay, well, we're not done with our work yet. The user also wants us to round this to two decimal places. So round is another function that'll basically take two inputs here. And you can basically have your first input be total revenue and then the amount of decimal places you want to appear after it. So going backwards one slide, we have many, many places after decimal. And then afterwards, after that round, we have just those two as well. So again, this kind of merges between stylistic as we were talking about with the pitfalls for the format and also potentially necessary if you just wanted to reduce it to those two. But again, possible tradeoff with precision. If you are storing this and doing downstream calculations on that. So again, be careful with round as well. As we mentioned previously in the lab and that we worked with, you can investigate that uniqueness with count and then distinct on that field as well. So an absolutely key concept when we talk about aggregations is when you mix together fields that aren't aggregated say like the identifying number for the charity, the ein. And then the count of all the charities for 2015, you get a little bit of a mismatch. So what you need to do is aggregate together or group together anything that's not under a formal aggregation function like count, sum, min, max and then put those field values in what's called a GROUP BY clause. And GROUP BY will immediately and always follow the FROM statements. So here we have, give us the identifying number for the charities and the count of them from the table there and GROUP BY the ein number. And again here, the point of this entire query is actually to see for each of the identifying numbers, count the occurrences or the number of filings that are present in that 2015 table. So if you're thinking real hard, you could be wondering like I would normally just expect to see one filing per each unique organization's number. So this is almost like a data quality check. So you might say like, I don't normally want to group by something like ein. But for here, I'm going to check the data quality. But before we get into that, again, the key pitfall is don't forget to include that GROUP BY clause. When you start invoking things like sums and mins and maxes, immediately add that GROUP BY or the validator will let you know when you try to run that query. Interesting, so after we ran that query, here are the results. You see that there are eins or organizations that have seven different records or seven rows that are present in that final year 2015. Let's see if we can't get a little bit more insight into why that is. Why on Earth would you have seven filings for one particular year for 2015? Well, let's see how often that does occur. So excluding what we would consider the normal use case which is having just one filing, how do we actually write that out in SQL? So if you wanted to exclude an aggregation. So after filtering has already been done, so we basically want to say, all right, we've done this aggregation for the counts and we basically want to now filter the aggregation. Filtering an aggregation is done by using a special SQL clause called the HAVING clause. So whereas the WHERE clause filters out rows preaggregation as we mentioned before, this is immediately going to the dataset. And the query engine says, all right, cool. What can I lose from here immediately? That's done in that WHERE clause. After you've done the aggregations, after that work is done, that's where you could have that HAVING clause and why am I making such a big deal of this? It's because now we can use an alias in the HAVING, because the work has actually been done and that field is that recognized as the alias as well. So HAVING is very, very, very useful when you're filtering aggregations. So you want to say, all right, well, you showed me that in the previous slide. There are all of these different individual charities with these seven filings. How often is that the case? And here after we execute that and just looking at the paginated results there at the bottom, almost 18,000 instances or 18,000 different charities. Another way to get that instead of looking at the paginated results is tracking the actual queries input and output, and that's in the big query results panel. You can actually click over and click on Explanation. And in the third course, Achieving Advanced Insights. You're going to get a very deep dive into what this query plan actually does, because this will help you perform and optimize your queries. But if you're looking for just the amount of rows that came in, the amount of rows that came out, they're in red. That's exactly what happens. The processing and shuffling that occurs between each of these different stages is a very, very exciting topic in the architecture lecture on the third course. So a lot of you are still wondering, you still haven't told me why there are seven records. Please tell me why there are seven records. So we're going to zoom in on one particular ein here. And if we wanted to zoom in on one, how you actually do that is by filtering every single record that's returned just for that particular condition that you want to meet. And then here, you want to make sure that you're using the correct data type. So if ein is stored as an integer or if it's stored as a string, strings you need to have in single quotes. Integers, you do not want to put in quotes here. So keep that in mind. So here's the result. So we have seven paper filings for one ein for 2015 which is the tax period of 2014 since you file your taxes a year after, the actual tax period and we pull up another interesting field called the tax period that's associated with the filing. So in the 2015 calendar year, we have this ein 2008 to 2014 filings as well. So thinking about why that is to cut to the chase, the real kind of answer that's staring everybody in the face is human error or dirty data or an organization is submitting more than one tax period filing. Because maybe they knew they could get caught up to speed or maybe because there was some kind of refiling. If there's another flag in here that says, this was a corrected filing or something like that. There could be many, many cases. But the bottom line is unless you can tease that out with other columns in the data, you need to go to the subject matter expert who understands how this data was set up in the first place. But you want to be aware of anomalies like this. So when you start totaling things up like give me the sum of all revenue or the average of all revenue for across all of your different employers. If you have something that occurs seven times, that's going to skew your results. So just keep in mind for situations like this. Now if you wanted to handle that inside of SQL say, if you just wanted to pull the latest 2015 filing for that tax period. So you just wanted to pull that record number five there. How could you extract just the tax periods for 2014? And again, that's for the calendar year 2015. So if you just wanted 2014, you can invoke a date filtering function. So you basically want to say, all right, well, I want to treat this field as a date since it's not currently stored in very friendly year, year, year, year, month, month, day, day format. We need to invoke or use a parsing function that basically says, hey, treat this tax period in the middle as a string instead of an integer which it's currently stored as. That's where you're using that CAST function and then you're nesting that inside of a date parsing function. Basically saying, all right, now we have this string that I want you to interpret it as year, year, year, year, month, month. And that's the percentage sign Y, percentage sign lowercase n there as well. And then once you can actually parse that as a date, treat that as tax period. And again, we cannot use that tax period alias for filtering in the WHERE clause, unless you had separated this into two separate queries or as part of a subquery that you're going to learn a little bit later in later courses. We're going to repeat that exact same function in your filter, except here we're going to basically say, all right. Well, once you've returned that result of what that tax period is, the only ones that we want are the ones that are for 2014 for this particular calendar year table of 2015. And how you use that is you add a third function, which is the EXTRACT function for dates. So you can extract from that date field up a year. Now, why are we bothering to do all this date parsing? Why don't you use something just like a left four, and then just grab the first four characters? And the benefit is kind of that if we can properly treat this field as a date, then we can do normal operations like extracting the year or doing date difference. It just allows it to treat it as it should be, which is as a date field. So this kind of pre-processing. Again, you would want to do when you're ingesting the data or you want to do it before you store a final reporting table as well. So in future modules, we're going to teach you how to cleanup and transform this data both using SQL as you see here. But also a little bit easier if you don't have these functions immediately recalled to memory through a UI, a user interface called Cloud Dataprep. And again, for these functions, PARSE, DATE and EXTRACT, I don't even have a lot of these committed to memory. You can just search online for Google's standard SQL bit query and that'll take you to that one page guide.