The mathematical tool, or notation, or ideas that we need, again, gradients and Hessians. Somehow we need to apply gradient in the Hessian and do some analysis on them so that we may get our desired convexity. I hope gradients and Hessians are not too weird for you, because you actually should have known them if you have applied gradient descent and the Newton's method. It's simply a collection of first-order derivatives and also a collection of second-order derivatives. In this case, pretty much what we need to do is that we need to do some analysis on this functions or this quantities, and we somehow need to have them satisfy some properties so that our function is convex. One thing to remind you is that all our Hessians, at least in this course and in most of the practical problems, it would be symmetric. Because we can see that this and that, these two values are doing second-order derivatives with x_1 and then with x_2. It doesn't really matter what the order is if we are talking about continuous second-order derivatives. I hope this makes sense to you. This is just a numerical example to help you to recall your memory about how to do the gradients and the Hessian calculations. For this particular function, the first-order derivatives can be collected to obtain your gradient, which is a vector. Then for Hessian, you collect all your second-order derivatives, and then you get this particular matrix. The gradient and Hessian somehow are functions of variables. You may see that they may not have a single value when they are at different points. You may specify a specific point and ask, what's the gradient at that point? What's the Hessian at that point? Then all you need to do is to plug in those values into your functions. If we were talking about 3, 2, and 1, then in this case, this would be six, this would be one, and then this would be 2 plus 3. So your gradient would be 6, 1, and 5. For a Hessian, pretty much everything the same, it's just that here you need to plug in one. Your corner value would be six, that's the Hessian at the point 3, 2, and 1. Now, we are ready to talk about our convexity condition. First, let's recall the following theorem for single variate functions. We say f is convex if the second-order derivatives are non-negative. It's an interior local min only if the first-order derivative is zero. Then finally is convex. If it is convex, then it is a global min if and only if the first-order derivative is zero. Now, we have an analogous theorem for multivariate functions. So f would be convex if, the Hessian matrix obviously is some second-order derivatives, so our Hessian matrix should be positive semi-definite, which is something that we will tell you in the next page. Then if we have that condition, our next one would be, our x-bar would be an interior local min only if the first-order condition is satisfied, only if our gradient is zero, which is again another extension from one dimension to n-dimensions. Then finally, if f is convex, our x star would be a global min if and only if the gradient is zero, again, the first-order derivative. The only thing that needs some illustration is the idea of positive semi-definite. Let's take a look at what's that. Positive semi-definiteness is a property of matrixes. So the definition is here. A symmetric matrix A is called positive semi-definite if for any vector x, that is in R^n , we have this property. So that property says x transpose Ax, are all non-negative for any possible x that you may find. So pretty much this is some kind of extension of the second order derivatives being zero. So what does that mean? Well, pretty much if your A is scalar, then this somehow gives you x square. So that has something to do with second-order derivatives. But anyway, let's take a look at these examples. So the examples says that if you have an A which is 2, 1, 1, 2, then if you take any possible x, so your x, obviously are x_1, x_2. Then if you do this, you're going to get x_1, x_2 multiplied by 2,1,1,2 multiplied by x_1, x_2. If you do some arithmetic, you will get this one. This one may be further arranged to some of several squared terms. That's all non-negative. So that's how our A matrix here is positive semi-definite. So that's one successful story. But on the other hand, if your A in the second example is 1,2,2,1, then in that case, after you do this x transpose Ax, you get something different. You may see that if you plug in one and negative one, that's going to give you a negative number. So it does not satisfy the definition and then it is not positive semi-definite. So the thing is that we now want to have an effective way to determine whether a matrix is positive semi-definite. So the definition may not be easy to be used so we need some properties so that we may show that the Hessian of a function is positive semi-definite, so that we may show that the function is convex. Let's see this very useful proposition. The proposition says that, if it is symmetric, a matrix has the following three things that are equivalent. So a matrix can be positive semi-definite, or a matrix, it's eigenvalues may be all non-negative or it's leading principle minors are all non-negative. So the three things are equivalent, which means if you are able to show that A's eigenvalues are all non-negative or if A's principal minors are all non-negative, then you'll have shown that this matrix is positive semi-definite. Let's do some reviews about linear algebra. A's eigenvalue is called Lambda. Or if A is m by n, then you have n eigenvalues Lambda 1, Lambda 2 and along there is a eigenvalue with an eigenvector x if you have this particular equation, Ax equals Lambda x. Let me assume you have learned this in your linear algebra course. We will have some examples later, but here I will just stop here. If you are able to find a vector and a value satisfying this, then this is a pair of eigenvalue and eigenvector. Then we have this concept called principal minors, what's that? Well, minors in linear algebra means when you have a matrix, you gave a sub-matrix and the determinant of that sub-matrix is called a minor. When we say principal minors, basically that's still sub-matrix determinants. It's still a determinant of sub-matrix. But for principal minors their diagonal is part of A's diagonal. Understanding? Suppose I have a three-by-three matrix, let's call it A. Then A's diagonal is here, this is A's diagonal. Suppose I have a submatrix, lets say this blue one, then because the blue one's diagonal is part of A's diagonal, we say the blue one is a determinant of this blue submatrix. It's called a principle minor. It's one of the principle minors. Of course we may have others submatrix, lets say this one. This is also a two-by-two submatrix, even though they are not put together, but you can do that by yourself. If you chose this two-by-two submatrix, or if you choose this two-by-two submatrix, they still have their diagonals, but the diagonals is not part of A's diagonals. In that case, it is not considered as a principle minor. Basically for this particular example, because you have a diagonal here, for Level 1, you have three principal minors, with this guy, this guy, or this guy. If we are talking about Level 2, then for Level 2, you either choose these two, or these two, or the first and the last. You also have three principal minors. Lastly, you also have Level 3. For Level 3, you have one principle minors. That's the concept of principal minors. If you have a matrix, then of course, the number of principal minors is finite. You may find all these principal minors to see whether they are all non-negative. This may be time-consuming in some cases, so we actually have a sufficient condition. What's that? Well, you only need to look at leading principle minors, so lets see what's that. For leading principle minors, that's pretty much the thing that we need to focus on the top left choices. While in this particular case we have three principal minors, we only have three leading principle minors. They are the first one, the second one, and the last one. Your leading principle minors need to be at the diagonal. They need to be put together at the top left corner. That's leading principle minors. If we are doing calculations with hands for metrics with well, three-by-three or four-by-four, typically principle minors is easier to be done. In many cases we would look at leading principle minors first because if all of them are all strictly positive, then all your principal minors will be non-negative, or I should say positive. If all your principal minors are positive, then they are of course non-negative. One thing to remind you is that if all your leading principle minors are non-negative, then this is not enough, you need to have all your leading principle minors to be positive. Now giving these principals later we will have some examples, we're going to go through these steps. We will have a function f, we will first find its Hessian, and then either find eigenvalues or find the principle minors, calculate their values, and then we will be able to determine over what region the Hessian is positive semi-definite. Once that, we can conclude that over that region, the function will be convex. We will see some examples.