So, here you can see what I call the error function which tells me

how much error I had and I can sum it right here.

So, this is doing the summation.

I chose to do the summation outside in the loop instead of building into

the function and I count how many times I call that.

Then the cost function gets called down here where I pass

the Theta values and the current errors and it returns the new Theta array.

The new updated Theta values says is one iteration,

some more instrumentation counting how many times I counted that.

I was curious to see givens from different datasets,

how many times these functions would get called,

and then this is all the code to check for

convergence and see if my Thetas are running away because my Alpha value,

my learning rate was way too high and I use that to keep turning it down.

So, I set it to one, then I set it to 0.1, 0.01,

and 0.001 and 0.0001 until I got down to something where

the Thetas weren't oscillating out of control up and down with these huge amounts.

So, that's the routine.

So, I initialized my Thetas to all ones,

and I initialize a new Theta arrayed to all zeros,

I initialized the cost and some of

these counters and then I called the batch gradient descent function and does it's thing,

iterates through and cranks out

the Theta values and prints out some information about them.

So, Theta values in the slides and how many times

the error in the cost functions were called.

I can spend time on plot results,

I needed a subroutine to plot all the values for me.

So, I'm not going to go through all of that,

I want to get down to this stochastic gradient descent.

It's very, very similar in

structure except the outer loop is for i and range of the m training samples.

So, we just stepped through each of the 16 training examples one time,

and then we iterate on the Theta values in

exactly the same way we did before in the batch gradient descent.