10_Advice for Applying Machine Learning
Carpe Tu Black Whistle

Evaluating a Learning Algorithm

Some trouble shooting for errors:

  • Getting more training examples
  • Trying smaller sets of features
  • Trying additional features
  • Trying polynomial features
  • Increasing or decreasing λ

test setting

To tackle the problem of overfitting
Divide the data into 2 part: training set and test set (70% and 30%).

training procedure

  1. Learnand minimizeusing the train set
  2. Compute the test set error

test set error

  1. For linear regression:
  2. For classification ~ Misclassification error (aka 0/1 misclassification error)

ifor if()

Model Selection

Given many models with different polynomial degrees

We have a systematic approach to identify the “best” function

In order to choose the model of your hypothesis

Break Down the dataset

  • Training set: 60%
  • Cross validation set: 20%
  • Test set: 20%

Procedure

  1. Optimize the parameters inusing the training set for each polynomial degree.
  2. Find the polynomial degree d with the least error using the cross validation set.
  3. Estimate the generalization error using the test set with,(d = theta from polynomial with lower error);

Bias and Variance

This section is talking about the relationship between the degree of polynomial and the underfitting or overfitting hypothesis.

  • distinguish whether bias or variance is the problem
  • High bias is underfitting and high variance is overfitting.

image

High bias(underfitting): bothandwill be high.Also,.
High variance(overfitting):will be low andwill be much greater than

Regularization and Bias/Variance

image

asincreases, our fit becomes more rigid.

  1. Create a list of lambdas ({0,0.01,0.02,…,5.12,10.24})
  2. Create a set of models with different degrees or any other variants
  3. Iterate through thes and for eachgo through all the models to learn some
  4. Compute the cross validation error using the learned(computed with) on thewithout regulariztion or
  5. Select the best combo
  6. Using the best comboand, apply it onto see if it has a good generalization of the problem.

Note: the detail of the Step 4 is quite significant

Learning Curves

The X axis is m which means the size of training set.
The Y axis isof test set or train set.

High Bias

image

Low training size:

causesto be low andto be high.

Large training size:

causes bothandto be high with

High Variance

image

Low training size:

will be low andwill be high.

High training size:

increases with training set size andcontinues to decrease without leveling off.
Also,but the difference between the two curves remains significant.

Review

Problem Shooting

As follows:

  • Getting more training examples: Fixes high variance
  • Trying smaller sets of features: Fixes high variance
  • Adding features: Fixes high bias
  • Adding polynomial features: Fixes high bias
  • Decreasing: Fixes high bias
  • Increasing: Fixes high variance.

Neural Networks

Fewer Parameters

  • prone to underfitting
  • computationally cheap
  • High bias and low variance

More Parameters

  • prone to overfitting
  • computationally expensive
  • Low bias and high variance