Let's talk about hypothesis tests and why you would want to use them. So far, we have focused on estimation. That is, to describe a population as accurately as possible using your sample data. You estimate parameters that describe the entire population. An estimate is merely descriptive. Therefore, we often switch from estimation to testing. Testing means that we claim something. And we use data to test this claim. This claim is called a hypothesis. And in this video, I will show you the basics of hypothesis testing. Therefore, I wish to introduce you to Mary and Peter. Mary gives birth a child, and her husband Peter is in doubt. Because he works for the navy and hasn't seen Mary for more than 10 months. As you might know the average pregnancy duration is 40 weeks. The inevitable question is, would you be worried if you were Peter? The claim that we make is that the child is actually Peter's. This is our hypothesis in statistical terms. To judge if we believe this hypothesis, we need some more information on pregnancy durations. We will test the hypothesis using data as our proof, data on pregnancy duration. If the child is Peter’s, Mary must have been at least 305 days pregnant. However, the data or the proof is that the average pregnancy duration takes 282.5 days. Okay, let's first look at these two examples. Would you be suspicious if Mary had been pregnant for 285 days? Well, probably not, right? As 285 days is so close to the average, why would you be worried? Now, would you have been suspicious if Mary had been pregnant for 400 days? Probably you would have been. Because 400 days is just so long, it is improbable that Mary would be pregnant for 400 days, unless she's not human but an elephant or something. So we see. that for 285 days or for 400 days, it's easy to make a decision. However, for 305 days, which is the case that we are considering, it is a gray area. It could be that Peter is the father or it could not be. To answer this question we need some more evidence, some more data. Apart from the average pregnancy duration we also need to look at the variation in the pregnancy durations. And this is data on the pregnancy duration of woman. It was collected by Kieler et al. It is approximately a normally distributed variable and Kieler et al., found a mean of 282.5 days. And they found that the variation in pregnancy durations, the standard deviation is equal to 10.5 days. The next step is to use this data to decide on our hypothesis. Is it likely that Peter's father given the collected evidence, that is the data from Kieler et al, and given that Mary is pregnant for 305 days? For this we will calculate a so-called p-value. The P-value is a probability, that's why it's called p-value. And it's the probability, in this example, that the pregnancy takes at least 305 days. Okay, if you calculated, in this example, the p-value equals 0.016, which is 1.6%. That means that Mary's pregnancy duration of 305 days is suspicious, because only 1.6% of all pregnancies take this long. So what does this example tell us about hypothesis testing? It is a procedure to arrive at a decision. We do this as objectively as possible. And it deals with uncertainty in a rational manner. More advanced statistical techniques can even be used to quantify the risk of a false decision. Other examples of claims that can be tested are whether two variables, say X and Y, are correlated. Whether a variable is normally distributed. Or whether two groups have equal variances. The process of hypothesis testing looks like this. You begin always with a null hypothesis, which is the hypothesis that there is no effect between the variables you are testing. And you state your alternative hypothesis, which is the hypothesis that there is an effect between your variables. Next, you have to collect evidence, which will be mostly in the form of data. After you collected this data, you can calculate your p-value. This p-value will allow you to decide whether or not to reject your null hypothesis. If the p-value is small enough this means that you have found a significant effect, that your H0 is unlikely and that you reject this null hypothesis. If a p-value is small enough, it's quantified as saying the p-value is smaller than 0.05 which is a common used threshold. If the p-value is larger than this threshold of 0.05, you do not reject your null hypothesis. Let's take a look at an example. You would like to know whether IQ, which is how smart you are, and brain size, which is how big your head is, are related. The null hypothesis is that there is no relationship between these and that being smart has nothing to do with the size of your brain. The alternative hypothesis is that the two are related, and then probably that the bigger your brain, the higher your IQ will be. We got it's data and found a p-value of 0.791. This p-value is very high. It is certainly bigger than the 0.05 threshold. Therefore, we can not reject the null hypothesis. We can therefore, not conclude that brain size and IQ are related. Or you can say, we have no proof of a relationship between IQ and brain size. You can think of 100s of different hypotheses. Every hypothesis has its own testing procedure to calculate a p-value. In these videos, I will show you the very popular ones, including regression and nova and chi-square tests. Summarizing, hypothesis testing is a procedure to arrive at a decision as objectively as possible, while dealing with uncertainty in a rational manner. You start with constructing your hypothesis, then you collect your data. And finally, you draw your conclusion.