In this lecture, we are going to learn how to compute an estimate of the multivariate Gaussian model parameters from observed data. Let us remind that a Gaussian model has two parameters, the mean and covariance matrix. We are going to use the same definition of likelihood we introduced for the univariate case. Likelihood is the probability of one observation given the model parameters which are unknown. We are interested in obtaining the mean and covariance matrix that maximizes the likelihood given a set of observations. This is the mathematical statements of our goal. I hope you remember that the likelihood function is a joint probability of all the data, which can be intractable, in general. But as we did for univariate Gaussian, if we assume independence of data points, the joint likelihood can be expressed as the product of individual likelihoods. With this notation, how can we obtain the maximum likelihood estimate of the parameters that are now a vector and a matrix. Again, we can induce the solution analytically, and the key ideas are the same to the one-d case. First, we are going to use the properties of log functions. We have seen that instead of maximizing the likelihood, we may find the parameters that maximizes the log likelihood, because our maximizers are the same. Also, remember that the log of products equals the sum of the logs. Now, we can rewrite the problem as finding mu and sigma, that maximizes the sum of all the log likelihoods of the individual measurements. The next step is to apply the specific form of Gaussian PDF and take the log of it. As before, we can ignore the constant term C, because it does not affect the solution. Then, we can change the formula into a minimization problem. Finally, we solve the optimality condition for minimizing the cost function j. This will give us the maximum likelihood estimate of the mean and covariance matrix. The full details of mathematics can be found in the supplementary file. It might look a bit more complicated than the one-d Gaussian case, because now we have a vectors, a matrix, and a cost function. But essentially, the principles we apply are the same. The final solution we get for computation is exactly the sample mean, the form of vector, and the sample covariance matrix. Now, we have learned how to estimate the multivariate Gaussian parameters, the maximum likelihood sense. Let's get back to our ball color example. The closer view of the graph shows the ball color distribution, the blue and red dimensions. Using the data and the formula we obtained, we can compute the maximum likelihood estimate of the parameters as shown. From the contours in the plot, we can visually check that the red and the blue channels are correlated negatively in our model. The examples we have created so far have the nice symmetric shape with a single peak. However, it is possible that some targets have weird distribution of variables. In the next lecture, we will learn how we can employ multiple Gaussians to form a mixure model, which can express diverse distributions.