02_Hung-yi Lee_Pokemon Classification & strategy

Carpe Tu Black Whistle

2022-03-25 10:10:22 2022-03-25 10:10 2022-12-12 11:06:57

Machine Learning

369 Words 2 Mins

Pokemon vs. Digimon

Function with unknown Parameters

： number of candidate functions(model “complexity”)

Loss of a func(given data)

Given a dataset
Loss of a thresholdgiven data set

Training Examples

If we can collect all Pokemons and Digimons in the universe, we can find the best threshold
We only collect exampelsfrom

if we can collect all Pokemons and Digimons in the universe, we can find the best threshold
$$
h^{all} = \arg \min\limits h L(h, \mathcal{D}{all}) \qquad \text{理想}
$$
We only collect some examplesfrom
$现实$

we hopeandare close.

Note:

can be smaller than

model-agnostic

don’t have assumption about data distribution
any loss function can be used

Hoeffding’s Inequality:

The range of lossis [0, 1]
is the number of examples in

To make P smaller: Largerand smaller
What if the parameters are continuous?

Everything that happens in a computer is discrete.
VC-dimension(not this course)

Tradeoff of Model Complexity

Strategy

Framework of ML

Training data:

Testing data:

pipeline

General Guide

Split ur training data into training set and validation set for model selection

Optimization issue

Gaining the insights from comparision

这是 Residuals Network 论文上的结果
并不是overfitting，这代表着 56-layer的Optimization 并没有做好。
56-layer 的network，一定可以做到 20-layer 的泛化能力

Start from shallower networks(or other models which are easier to optimize)
If deeper networks do not obtain smaller loss on training data, then there is optimization issue.
Solution: More powerful optimization technology.

Overfitting

Small loss on training data, large loss on testing data

Data augmentation

Augementation要有道理，一般不会将图像上下颠倒

constrained model

Bias-Complexity Trade-off

Cross Validation

过多的利用 Public Testing Set 去 select model，会使得模型很容易在公开测试集上过拟合。因此不太推荐。

N-flod Cross Validation

适用于小模型

mismatch

Your training and testing data have different distributions.
Most HWs do not have this problem, except HW11

Post title: 02_Hung-yi Lee_Pokemon Classification & strategy
Post author: Carpe Tu
Create time: 2022-03-25 10:10:22
Post link: Machine-Learning/02-hung-yi-lee-pokemon-classification-strategy/
Copyright notice: All articles in this blog are licensed under BY-NC-SA unless stating additionally.

1. Pokemon vs. Digimon
2. Strategy
1. 2.1. Framework of ML
  1. 2.1.1. pipeline
2. 2.2. General Guide