02_Hung-yi Lee_Pokemon Classification & strategy
Carpe Tu Black Whistle

Pokemon vs. Digimon

image

Function with unknown Parameters

image


: number of candidate functions(model “complexity”)

Loss of a func(given data)

  • Given a dataset
  • Loss of a thresholdgiven data set

image

Training Examples

  • If we can collect all Pokemons and Digimons in the universe, we can find the best threshold
  • We only collect exampelsfrom

  • if we can collect all Pokemons and Digimons in the universe, we can find the best threshold
    $$
    h^{all} = \arg \min\limits h L(h, \mathcal{D}{all}) \qquad \text{理想}
    $$
  • We only collect some examplesfrom

we hopeandare close.

Note:

can be smaller than

image

model-agnostic

  1. don’t have assumption about data distribution
  2. any loss function can be used

image

image

Hoeffding’s Inequality:

  • The range of lossis [0, 1]
  • is the number of examples in

To make P smaller: Largerand smaller
What if the parameters are continuous?

  1. Everything that happens in a computer is discrete.
  2. VC-dimension(not this course)

Tradeoff of Model Complexity

image

Strategy

Framework of ML

Training data:

Testing data:

pipeline

image

General Guide

image

Split ur training data into training set and validation set for model selection

Optimization issue

  • Gaining the insights from comparision

image

这是 Residuals Network 论文上的结果
并不是overfitting,这代表着 56-layer的Optimization 并没有做好。
56-layer 的network,一定可以做到 20-layer 的泛化能力

  • Start from shallower networks(or other models which are easier to optimize)

  • If deeper networks do not obtain smaller loss on training data, then there is optimization issue.

  • Solution: More powerful optimization technology.

Overfitting

  • Small loss on training data, large loss on testing data
  1. Data augmentation

image
Augementation要有道理,一般不会将图像上下颠倒

  1. constrained model
    image

Bias-Complexity Trade-off

image

Cross Validation

image

过多的利用 Public Testing Set 去 select model,会使得模型很容易在公开测试集上过拟合。 因此不太推荐。

N-flod Cross Validation

适用于小模型
image

mismatch

  • Your training and testing data have different distributions.
  • Most HWs do not have this problem, except HW11

image