[MUSIC] In this lecture we'll talk about Bayesian statistics. In a previous lecture we talked about likelihoods and expressing relative evidence for one hypothesis compared to an alternative hypothesis. And we've used evidence in this sense just based on the data that we have at hand. But we'll see that very often you have some prior belief that you might want to incorporate with the evidence and Bayesian statistics allows you to do this. Now let's say that you flip a coin three times. The coin comes up heads every single time. Now do you think that this is a fair coin or not? The data that you have at hand shows that this coin came up heads three times. So if you would be a newborn baby, then you don't have any prior beliefs and all you have is the data that you just observed. So for a newborn, a newborn would say it comes up heads every single time. This is just what this coin does. But you have prior beliefs that most coins are fair. You might think, well, this happens. Three heads in a row, that's a perfectly likely observation. I'm not changing my prior beliefs that this is a fair coin yet. You would want to see more evidence. Now, combining your prior belief with the data is possible in this Bayesian sense. It's not possible when you just calculate P-values. Remember that the P-value expresses the probability of the data or more extreme data assuming that the null hypothesis is true. And some people say, this is actually not what you want to know. What you want to know is the probability that the null hypothesis is true, given some data that you have collected. And this is a posterior probability. Given that you've collected some data, and maybe some prior beliefs that you have. What is now the probability that the null hypothesis or the alternative hypothesis is true? So Bayesian statistics allows you to express this probability. It works quite straightforward. You have some prior belief, you have some data. You combine these into a posterior belief. We can calculate posterior odds that the alternative hypothesis is true given the data, compared to the probability that the null hypothesis is true given the data. This is a very useful way to split apart these different aspects of the formula. So the posterior equals the likelihood ratio times the prior. So, here you see that this likelihood ratio, that we discussed in a previous lecture, is an essential part of the probability calculations in Bayesian statistics. But it's also combined with this prior, so how can we do this? We have to come up with a prior distribution. In the case of binomial probabilities, a beta distribution is used. The beta prior is determined by two parameters which are referred to as alpha and beta. Now this can get a little confusing, because we already talked about alphas and betas as type one errors and type two errors. Regrettably, we see that statisticians are not very creative when it comes to thinking up greek letters that they want to use for their statistics. So there are some double use in the literature, and this is one of these examples. The beta prior is determined by alpha and beta. These are just two numbers that have nothing to do with error rates whatsoever. Let's take a look a different versions of prior distributions. If we set the alpha and the beta in the beta probability to 1, we see that we got the perfectly flat line. This means that every value of theta which is plotted on the horizontal axis is equally likely. So we don't have any expectations and we say anything that might happen is possible. Now in the case of a coin flip, you might not be convinced that this is what's most likely. In the case of flipping a coin, you can have a lot of different beliefs. One of them is the belief that this is a coin that always comes up heads. If you have a very strong belief that this is the case, then this is your prior. You think that it will be heads almost always, although you are willing to give at least some probability of different values. You might also think that this is a perfectly fair coin, most coins are fair after all. So you think that this coin should come up heads 50% of the time. You have a very strong belief in this. And it's illustrated in this graph where there's a very high peak at 0.5 on the theta. But you're also willing to allow for some variation. It might be that the coin is not perfectly fair. It might have been used and it might be an old coin and might come up heads a little bit more often or a little less often. Now let's combine our prior belief, whatever it is, with the data that we've observed. So we have beta prior distribution, and we have the likelihood function, and we can quite easily combine these into a posterior distribution. The posterior is also a beta distribution with an alpha and a beta value. And the alpha and the beta are determined simply by adding the alpha of the prior to the alpha of the likelihood function minus 1. The same is true for the beta. The beta of the prior distribution added to the beta of the likelihood distribution minus 1. Let's see how this works. In this case we have a prior distribution that's uninformative, anything goes. So this is a prior before we collect some data. And now we collect some data and we have a likelihood function. In this case we flip the coin ten times. Six out of the ten times we observed heads. So that's expressed by this likelihood function and now we will combine these two functions into a posterior distribution. You now see a black line that falls exactly on top of the likelihood function. This is what's meant with an uninformative prior. Everything was equally likely beforehand, and it doesn't influence our judgments in any way. We just believe whatever we have observed exactly the same. So the prior does not influence the posterior distribution in any way. Let's take another example where this is the case, where the prior does influence the posterior distribution. Here we have some belief that the coin is probably fair. We think that values around 0.5 are somewhat more likely but we don't have a very strong conviction about this. We again observe the same data. We flip the coin ten times, and six out of ten the coin lands heads. So this is exactly the same likelihood function that we had in the previous example but we have a slightly different prior. Combining our prior belief with the observed data yields a posterior distribution that no longer falls exactly on top of the likelihood function. Indeed we see that the posterior is slightly shrinking towards the prior belief that we held. Now, using the prior distribution and the posterior distribution, we can calculate something that's known as a Bayes factor. The Bayes factor is the relative evidence for one model compared to another model. Now compare this to likelihood ratios. In a likelihood ratio we only have one distribution and we're comparing two different values of theta on the likelihood function. But here we'll use the likelihood from the prior distribution and compare it to the likelihood of the posterior distribution. Let's take a look. We have data from 20 coin flips. They came up heads 10 times. Let's take a look at two different priors. Either a Beta prior of (1, 1) which means it's completely uninformative, the uniform prior. But we can also compare it to a Beta of (4, 4) where we have some expectation that this is a fair coin. Now, in this graph again we see the prior distribution in the gray line that's uniformly distributed. We don't have very strong prior beliefs in this case. Everything is equally likely. Then we've collected some data, and now we have a posterior distribution, which in this case falls exactly on top of the likelihood function. You can see that there is a relative difference of the likelihood of both of these functions at specific points. Now, we have to take pick one hypothesis that we're interested in. Let's say that we want to know whether this is a fair coin or not. So in this case, we would test the hypothesis that theta is 0.5. We can compare the prior distribution with the posterior distribution to see how much our belief has changed by collecting some data. So the prior at 0.5 was much lower than the posterior is at 0.5. And the ratio of these two points is 3.7. This is a slightly different version where we only have a different prior. We've observed exactly the same data but the prior now is slightly more informed. We already expected that this coin would be fair. We collect the same amount of data and we again see that for the theta of 0.5 the hypothesis that this is a fair coin. Our belief in the hypothesis that the coin is fair has increased. But because we already had quite a strong prior, the data only increases our belief with 1.9. So after looking at the data, the hypothesis that the theta is 0.5 has become either 1.91 or 3.70 times more likely, depending on the prior that we had. So we can see how base vectors tell you something about the increased evidence or the belief that you have about the specific hypothesis compared to your prior and the posterior. Now we can also look at Bayesian statistics not from a hypothesis testing viewpoint where you test the hypothesis and the prior and the posterior. But we just want to estimate which values do we think is most likely. So this is not Bayesian hypothesis testing, but it's Bayesian estimation. We also use the posterior distribution to estimate. So instead of testing two different models, the prior and the posterior, we will use the posterior just to estimate plausible values. Which values do you believe are most likely based on your prior belief and the data you have observed? In this graph we again see a prior distribution and a posterior distribution. And in this case, because we have an uninformative prior, the likelihood and the posterior fall exactly on top of each other. We can calculate the mean of the posterior distribution and we can calculate what, in Bayesian statistics, is known as 95% credible interval. A credible interval contains 95% of the values that you find most plausible. So this is an expression of your belief. Which values do you believe are most likely given your prior and the observed data? Now, in this case, because we have used an uninformative prior, you can see that the likelihood function is basically everything that matters. The posterior distribution is completely determined by the data. And this is a situation where the 95% credible interval in Bayesian statistics exactly matches with the 95% confidence interval in frequentist statistics. But that's not always the case. This is only true when we use an uninformative prior. Let's see how this changes when we use a more informative prior. Now let's take a look at this slightly different situation. Rather extreme example where we had a very strong prior belief that the coin would come up heads about two thirds of the time, so 0.666. You can see the grey distribution peaking at this point, so this was our prior belief before we collected some data. But then we collect data and we actually find a completely different pattern. We see that the coin came up heads about 40% of the time. And we collected quite a lot of data, so the blue likelihood function is completely different. We can also see that the black posterior distribution is now a little bit in between of both of these functions. So the data and the prior are merged into a posterior distribution. And the 95% credible interval now contains values that no longer match within 95% confidence interval. A very strong prior has made it so that our believe in the most likely most plausible values the values that we feel most confident in. Are no longer exactly matching up with the data, but they're slightly influenced by the prior that we have. This also shows a strength of Bayesian statistics. We had a prior belief, which we quantified by putting a function on it, and then we observed some data. And you can see that our belief can be changed. In this case, our posterior belief is still sort of halfway in between. But if we collect enough data, we should see that the posterior belief becomes very, very close to the likelihood function. So this shows that if we collect enough data, then we slowly update our belief, we can become convinced about something else. This is, of course, a very useful way to do science. And, this is also the difference between science and religion. In religion, you would not have any option to update your belief. But, in Bayesian statistics, we have a very logical framework, where data can change our prior beliefs, and lead to new posterior beliefs. So this 95% credible interval contains the values that you find most plausible. It's all about expressing and quantifying your belief in specific values. Now in this lecture we've seen how can you use Bayesian statistics to quantify your prior beliefs, collect data, and then update your beliefs based on the data that you've observed. [MUSIC]