So in this video we're going to go ahead and calculate on base percentage and slugging percentage in our Juyter Notebook. So just a refresher. Slugging percentage is defined as singles plus two times doubles, plus three times triples, plus four times home runs divided by at bats. This is really a measure of player power. This is typically thought to be a metric that measures player power in baseball, and on base percentage is defined to be hits, plus walks, plus hit by pitches divided by at bats, plus walks, plus hit by pitches plus sacrifice flies. So this is really a measure of the percentage, or the proportion, of times a player reaches base via hit, walk or hit by pitch in baseball. So this really improved upon traditional metrics, such as batting average, by including walks into its calculation, and that was really the key skill that the Oakland Athletics believed was being undervalued during the Moneyball era. So that's just a refresher for what these metrics are. Let's go ahead in our Jupyter Notebook and calculate them directly. So if we go, scroll down, so, so far we have run a data set that has data for each team level data point and their home statistics and their way statistics. So keeping that in mind, we can go ahead and start to calculate what on base percentage and slugging percentage are in our Jupyter notebook for each team. So if you recall from the last lecture, the last video, every data point in our dataset, every variable that starts with whom and ends with an underscore X represents the home statistics for team level data point and everything that starts with the visitor and ends with the underscore y represents the away statistics for team data point, and then the other side of that is any statistic that starts with home and ends with an underscore y represents the away team when our team data point is the home team. So that is our teams opponent and anything that starts with a visitor and ends with an underscore x represents the home team when our team data point is the away team. So those two variables represent our team data points components. So to start out with, we are going to calculate the on base percentage for our team data point. So we have, in this part of code here, teams two OBP 4 for on base percentage 4 and following the formula for on base percentage. So once again hits, plus walks, plus hit by pitches divided by at bats, plus walks, plus hit by pitches, plus sacrifice flies, we can start to calculate this directly in our notebook. So you can see here we have Teams2 home underscore H underscore X plus Teams2 visitor underscore H underscore Y. So if we saw those two variables together, that gives us the total hits for our team data point for each given season, and from there we follow the same type of syntax and we some together home and away walks and home and away hit by pitches for our team data point, and that we see the parentheses here. That finishes our numerator of the formula, and then we divide by at bats, walks, hit by pitches and sacrifice flies, both when our team data point is the home team and the visiting team. So we can go ahead and now run that piece of code there and next we calculate on base percentage against. So this is the on base percentage against our team data point, and the data here just reflects our team data points opponents. So as before, we actually replicate the same exact formula as we did in our On Base Percentage 4, but we're simply just going to switch around the X's and Ys. So, again, remember, home underscore Y represents the away team when our team data point is the home team and visitor underscore X represents the home team when our team data point is the way team. So something together those metrics, so hits, plus walks, plus hit by pitches on our numerator, dividing by at bats, plus walks, plus hit by pitches, plus sacrifice flies in our denominator, we'll get the On Base Percentage against our team data point for every season in our data. So we can go ahead and run that piece of code and now we're ready to go ahead and run for slugging percentage. So remember our formula for slugging percentage is singles, plus two times doubles, plus three times triples, plus four times home runs divided by at bats. We don't explicitly have singles in our data set, but we can easily calculate this by taking hits, minus doubles, minus triples, minus home runs and replacing singles in the formula with that which is equivalent. So to start out with once again we're going to calculate the slugging percentage for our team data point and we're going to start out, once again, just with our something together hits. So we can see here team to home underscore H underscore X plus Teams2 visitor underscore H underscore Y, and from there we're going to subtract out doubles, triples and home runs both for home and away statistics both when our team data point is the home team and the visiting team. That will give us our singles and from there we just do plus two times home and away doubles, plus three times home and away triples, plus four times home and away home runs, and then we close our parentheses there and that will give us our numerator and from there we divide by the total at bats. So home and away at bats for our team data point, and once again, for our team data point, home underscore X and visitor underscore Y represent the relevant statistics. So we can go ahead and run that piece of code there, and finally for slugging percentage against just like with on base percentage, we replicate the same exact formula for slugging percentage 4, but the only difference now is we are switching around our X's and Y's to represent our team data points opponents. So we can go ahead and run that piece and if we take a look now and we create this team's three dataset just with the relevant information that we have, so your team wins and then On Base For and On Base Against slugging for and something against, and we can go ahead and run that to see what it looks like and then you can see for each team data points. So for example, for Anaheim in 1999 we have the number of wins, we have the obvious percentage for Anaheim on base percentage against Anaheim, and then the sudden percentage for the slugging percentage against, and we have that for every team and every season in our data set. Now the final thing we really need to do before we run our regression analysis is we need to calculate the team winning percentage. So generally in a season teams play 162 games, but every once in a while there's a tiebreaker game which usually counts as a game 163 or there's a game that got postponed earlier in the season and never gets made up because it doesn't matter. So there's a 161 games. So to be 100% accurate, we want to make sure we calculate win percentage by taking the number of wins and dividing by the total number of games, even though usually it's going to be 162, it might vary from time to time. So to do that we can take a look at this code here. So we're going to do this separately for home games in away games, and then we're going to combine our two data sets and calculate the total wins that way. So we see here we have this team's GH for teams game home data set that we're going to create, and we're going to do a group by statement on our initial teams data and you can see that the variables that we're grouping by our year and home and home stands for the home team. So that's a team level variable here and then you can see this H win.count. So what this tells our notebook do is it's going to count the number of H wins, so home win observations in our data set by year and home team. Now, one thing to distinguish is that we're not actually counting the number of home wins in this case. We're counting the number of home win observations, which is equivalent to the total number of home games. So if we want to calculate the number of home wins, this will be a .Sum because we're using a dummy indicated variable for that, but since we have a .count, we're counting the number of home wind observations and that tells us the total number of games played at home for each team. From there, we're going to reset our index and then we are going to rename that column as home to be team as we're going to merge this on our team variable in a couple steps here. Then we're going to follow a similar process for away games. So here we have away games. So teams GA, and we're going to do a group by year and visitor, and we are counting the number of away win observations. So we're not counting the number of away wins in total, were counting the number of away win observations, which will be equivalent to the number of away games. From there, once again, we're going to reset our index and we're going to rename our variable from visitor to be team, and then the last step of this is we're going to go and do our merge. So we're going to create this data, said Teams G For teams games and then we're going to merge our home games in away games data set on team and years. So these are the unique variables in each data set in which we're going to link the two days sets aren't in the merge, and from there we can calculate the total number of games, you can see teams G games and that's going to be equal to our home win, and remember, in this case, H win represents our total number of home games played plus A win, which is the total number of away games played, and finally the last up we're going to just keep the relevant info. So that's your team and games, and finally we're just going to take a look at what the data set looks like. So we can go ahead and run that. Now we can see here we have the number of games played for each team in a season, normally it's 162, but you can see there's a few discrepancies. So 163, there was a tiebreaker game for Cincinnati, in 1999 Detroit only played 161 games in 1999. So we now we have an accurate game count for each team, and now we can go ahead and merge this game count onto our Team3 data. So we're going to go ahead and do that on team and year, and you can see that games is now in that data set and the final step is to go ahead and now calculate winning percentage, which is going to be defined as wins divided by games and our Team3 data. So we can run that. We now we have an accurate winning percentage for each team in every season in our data. We have our on base percentage for an on base percentage against metrics are slugging percentage for, slugging percentage against metrics and our winning percentage for every team, and we have all the data that we need to go ahead and run our aggressions and we will get to that in our next video.