16_Learning with Large Scale Dataset
Carpe Tu Black Whistle

Stochastic Gradient Descent

we have a very large training set, gradient descent becomes a computationally very expensive procedure.

Algorithm


  • Randomly shuffle dataset.
    Repeat {
    {


    }
    }

Tips

SGD Algorithm is much more faster than Batch Gradient Descent when we have large scale dataset.

Learning rateis typically held constant. Can slowly decreaseover time if we wantto converge. (E.g.)

Mini-Batch Gradient Descent

  • Batch gradient descent: Use all m examples in each iteration
  • Stochastic gradient descent: Use 1 example in each iteration
  • Mini-Batch gradient descent: Use b examples in each iteration

Algorithm

Say.
Repeat{
{


}
}

Tips

小批量梯度下降,是介于 批梯度下降随机梯度下降之间的方法。

Checking for convergence

Plot, averaged over the last 1000(say) examples
image

Advanced Topics

Online Learning

Repeat forever {
Getcorresponding to user.
Updateusing

}

  1. get an example
  2. learn the example
  3. discard the example

Product search(learning to search)

User searches for "Android phone 1080p camera"
Have 100 phones in stores. Will return 10 results.
x = features of phone, how many words in user query match name of phone, how many words in query match description of phone, etc.

if user clicks on linke.otherwise.
Learn
Use to show user the 10 phones they’re most likely to click on.

Other examples: Choosing special offers to show user; customized selection of news articles product recommendation;

Map Reduce

image

Many learning algorithms can be epressed as computing sums of functions over the training set.

E.g. for advanced optimization, with logistic regression, need: