Evaluating a Learning Algorithm

Some trouble shooting for errors:

Getting more training examples
Trying smaller sets of features
Trying additional features
Trying polynomial features
Increasing or decreasing λ

test setting

To tackle the problem of overfitting
Divide the data into 2 part: training set and test set (70% and 30%).

training procedure

Learnand minimizeusing the train set
Compute the test set error

test set error

For linear regression:
For classification ~ Misclassification error (aka 0/1 misclassification error)

ifor if()

Model Selection

Given many models with different polynomial degrees

We have a systematic approach to identify the “best” function

In order to choose the model of your hypothesis

Break Down the dataset

Training set: 60%
Cross validation set: 20%
Test set: 20%

Procedure

Optimize the parameters inusing the training set for each polynomial degree.
Find the polynomial degree d with the least error using the cross validation set.
Estimate the generalization error using the test set with,(d = theta from polynomial with lower error);

Bias and Variance

This section is talking about the relationship between the degree of polynomial and the underfitting or overfitting hypothesis.

distinguish whether bias or variance is the problem
High bias is underfitting and high variance is overfitting.

High bias(underfitting): bothandwill be high.Also,.
High variance(overfitting):will be low andwill be much greater than

Regularization and Bias/Variance

asincreases, our fit becomes more rigid.

Create a list of lambdas ({0,0.01,0.02,…,5.12,10.24})
Create a set of models with different degrees or any other variants
Iterate through thes and for eachgo through all the models to learn some
Compute the cross validation error using the learned(computed with) on thewithout regulariztion or
Select the best combo
Using the best comboand, apply it onto see if it has a good generalization of the problem.

Note: the detail of the Step 4 is quite significant

Learning Curves

The X axis is m which means the size of training set.
The Y axis isof test set or train set.

High Bias

Low training size:

causesto be low andto be high.

Large training size:

causes bothandto be high with

High Variance

Low training size:

will be low andwill be high.

High training size:

increases with training set size andcontinues to decrease without leveling off.
Also,but the difference between the two curves remains significant.

Review

Problem Shooting

As follows:

Getting more training examples: Fixes high variance
Trying smaller sets of features: Fixes high variance
Adding features: Fixes high bias
Adding polynomial features: Fixes high bias
Decreasing: Fixes high bias
Increasing: Fixes high variance.

Evaluating a Learning Algorithm

test setting

training procedure

test set error

Model Selection

Break Down the dataset

Procedure

Bias and Variance

Regularization and Bias/Variance

Learning Curves

High Bias

Low training size:

Large training size:

High Variance

Low training size:

High training size:

Review

Problem Shooting

Neural Networks

Fewer Parameters

More Parameters