06_Overfitting and Regularization
Carpe Tu Black Whistle


Model can fit the training dataset perfectly, but can’t do genelize well to the test example.


or high bias, is when the form of our hypothesis function h maps poorly to the trend of the data.

It’s usually caused by few features.


or high variance, is caused by a hypothesis function that fits the available data but doesn’t generalize well to predict new data.

Two main options to address the issue of overfitting

Reduce the no. of features

  • Manually select which features to keep.
  • Use a model selection algorithm


  • Keep all the features, but reduce the magnitude of parameters.
  • Regularization works well when we have a lot of slightly useful features.


Regularization can “shrink“ some of the theta in the hypothesis function.


The, or lambda, is the regularization parameter.
It determines how much the costs of our theta parameters are inflated.

Note that using the above cost function with the extra summation, we can smooth the output of our hypothesis function to reduce overfitting. If lambda is chosen to be too large, it may smooth out the function too much and cause underfitting.

Regularized Linear Regression

Note: [8:43 - It is said that X is non-invertible if mn. The correct statement should be that X is non-invertible if m < n, and may be non-invertible if m = n.

Gradient Descent



The termperforms regularization.With some manipulation update rule can also be represented as:

will always be less than 1.
Intuitively you can see it as reducing the value ofby some amount on every update. Notice that the second term is now exactly the same as it was before.

Normal Equation

To add in regularization, the equation is the same as our original, except that we add another term inside the parentheses:



Recall that if m < n, thenis non-invertible. However, when we add the term λ⋅L, thenbecomes invertible.

as long as the parameteris greater than 0.

Regularized Logistic Regression


Cost Function

cost function for logistic regression:

cost function for logistic regression with regulazation:

The second sum,means to explicitly exclude the bias term,.I.e. thevector is indexed from 0 to n (holding n+1 values,through), and this sum explicitly skips, by running from 1 to n, skipping 0. Thus, when computing the equation, we should continuously update the two following equations:
