16_Learning with Large Scale Dataset

Stochastic Gradient Descent

we have a very large training set, gradient descent becomes a computationally very expensive procedure.

Algorithm

Randomly shuffle dataset.
Repeat {
{

}
}

Tips

SGD Algorithm is much more faster than Batch Gradient Descent when we have large scale dataset.

Learning rateis typically held constant. Can slowly decreaseover time if we wantto converge. (E.g.)

Mini-Batch Gradient Descent

Batch gradient descent: Use all m examples in each iteration
Stochastic gradient descent: Use 1 example in each iteration
Mini-Batch gradient descent: Use b examples in each iteration

Algorithm

Say.
Repeat{
{

}
}

Tips

小批量梯度下降，是介于 批梯度下降 和 随机梯度下降之间的方法。

Checking for convergence

Plot, averaged over the last 1000(say) examples

Advanced Topics

Online Learning

Repeat forever {
Getcorresponding to user.
Updateusing

}

get an example
learn the example
discard the example

Product search(learning to search)

User searches for "Android phone 1080p camera"
Have 100 phones in stores. Will return 10 results.
x = features of phone, how many words in user query match name of phone, how many words in query match description of phone, etc.

if user clicks on linke.otherwise.
Learn
Use to show user the 10 phones they’re most likely to click on.

Other examples: Choosing special offers to show user; customized selection of news articles product recommendation;

Map Reduce

Many learning algorithms can be epressed as computing sums of functions over the training set.

E.g. for advanced optimization, with logistic regression, need: