[MUSIC] So how does this work? So each parameter is going to be updated with a new value derived from its old value, plus taking a step in the direction of the gradient. And so in these plots, remember this is the coordinate pair, theta zero, theta one, that's the space we're in. And at iteration one, the pont is here. And at iteration two, the point is here. And at iteration three, and so on. Okay, so in order to get iteration (i+1), we're going to take the same value, the value of the parameter at iteration i and add this offset term. Okay, so what's going on with this offset term? So this is the learning rate, which is just the step size. How far are we gonna go in that direction, okay? And this is usually a small value, meaning 0.001 or something. Okay, but it really is application specific. It's a bit silly to even state what it should be. However, I didn't notice in this example I did scale everything down to a value. So, 0.001 on scaled data may not be an unreasonable thing to use as a default. And so, this operator is the partial derivative with respect to theta one. And if you remember a little bit of college calculus, the sum of these partial derivatives is the gradient. And so we're just doing one of these. Partial derivative with respect to one particular variable. And then J is our cost function that I haven't shown yet. And J will be a function of these parameters as well as a function of our data set. We're gonna take these parameters plus all the data set and compute some new costs. So blowing up j, so this is our model at iteration i. Remember we're trying to compute the new value to parameter of (i + 1). Apply that model to the data instance x. Subtract the true value, the correct value y, and that's our error, square that, and then add it up for all possible values in the data set, which is what K is. So we're going to run over all of our data points, check the difference, square it, add it all up. That'll be the value. And actually divide by two here but that's more to just make things work out. So you'll see that a lot. It's divided by constant factor so it doesn't actually change the direction involved. Okay? And so now we just need to differentiate this with respect to the parameter that we're updating. In this case theta zero. [MUSIC]