Previously, we learned about how optimal policies achieve the goal of reinforcement learning to obtain as much for more as possible in the long run. In this video, we'll describe the related notion of an optimal value function, and introduce an associated set of Bellman equations. By the end of this video, you will be able to derive the Bellman optimality equation for the state-value function, derive the Bellman optimality equation for the action-value function, and understand how the Bellman optimality equations relate to the previously introduced Bellman equations. Recalling that for two policies Pi_1 and Pi_2. Pi_1 is considered as good as or better than Pi_2 if and only if the value under Pi_1 is greater than or equal to the value under Pi_2 for all states. An optimal policy is one which as good as or better than every other policy. The value function for the optimal policy thus has the greatest value possible in every state. We can express this mathematically, by writing that vPi star of S is equal to the maximum value over all policies. This holds for every state in our state-space. Taking a maximum over policies might not be intuitive. So let's take a moment to break down what it means. Imagine we were to consider every possible policy and compute each of their values for the state S. The value of an optimal policy is defined to be the largest of all the computed values. We could repeat this for every state and the value of an optimal policy would always be the largest. All optimal policies have this same optimal state-value function, which we denote by v star. Optimal policies also share the same optimal action-value function, which is again the maximum possible for every state action pair. We denote this shared action value function by q star. Recall the Bellman equation for the state value function. This equation holds for the value function of any policy including an optimal policy. By substituting the optimal policy Pi star into this Bellman equation, we get the Bellman equation for v star. So far, we haven't done anything special. This is simply the Bellman equation we introduced previously for the specific case of an optimal policy. However, because this is an optimal policy, we can rewrite the equation in a special form, which doesn't reference the policy itself. Remember there always exists an optimal deterministic policy, one that selects an optimal action in every state. Such a deterministic optimal policy will assign Probability 1, for an action that achieves the highest value and Probability 0, for all other actions. We can express this another way by replacing the sum over Pi star with a max over a. Notice that Pi star no longer appears in the equation. We have derived a relationship that applies directly to v star itself. We call this special form, the Bellman optimality equation for v star. We can make the same replacement in the Bellman equation for the action-value function. Here the optimal policy appears in the inner sum. Once again, we replace the sum over Pi star with a max over a. This gives us the Bellman optimality equation for q star. In an earlier lecture, we discussed how the Bellman equations form a linear system of equations that can be solved by standard methods. The Bellman's optimality equation gives us a similar system of equations for the optimal value. One natural question is, can we solve this system in a similar way to find the optimal state-value function? Unfortunately, the answer is no. Taking the maximum over actions is not a linear operation. So standard techniques from linear algebra for solving linear systems won't apply. In this course, we will not form and solve systems of equations in the usual way. Instead, we will use other techniques based on the Bellman equations to compute value functions and policies. You might be wondering why we can't simply use Pi star in the ordinary Bellman equation to get a system of linear equations for v star. The answer is simple. We don't know Pi star. If we did, then we would have already achieved the fundamental goal of reinforcement learning. If we can manage to solve the Bellman optimality equation for v star, we can use the result to obtain Pi star fairly easily. We will see how this can be done in the next video. So we introduced optimal value functions and derive the associative Bellman optimality equations. The Bellman optimality equations relate the value of a state or state-action pair, to it's possible successors under any optimal policy. See you next time, where we'll learn how to find the optimal policy from the optimal value function.