写在前面
之前已经做过 Andrew Ng 的 CS229了,但是那门课,已经算是比较古早了(2016),课程的作业设计虽然做的非常精致,但基于Matlab语言编写的。现在主流的科研和开发语言,已经是python了。还是离实际算法落地,有一段距离。 CS229(2016),更多是练习写 toy model,但过了一遍一些机器学习的理论,感觉还是很不错。这门课就当,强化巩固。
This course focuses on Deep Learning
课程网站: ML 2022 spring
课程Github:https://github.com/virginiakm1988/ML2022-Spring
The repository contains code and slides of 15 homeworks for Machine Learning instructed by Hung-yi Lee.
Prof. Lee 把15节课程的录播都已经上传,在油管上会周更一些新的内容。这边基于2022版给予的课程录像,进行学习。加油。
Intro of the courses
HW1: COVID-19 Case Prediction
HW2: Phoneme Classification
HW3: Image Classification
HW4: Speaker Classification
HW5: Machine Translation
HW6: Anime Face Generation
Lecture 7: Self-supervised Learning
用引擎上爬下来的众多unlabel的图片来 pretrain,来获得更好的训练效果
Pre-trained Model(aka. Foundation Model) vs Downstream Tasks
Lecture 6: GAN
Lecture 12: Reinforcement Learning (RL
Lecture 8: Anomaly Detection
Lecture 9: Explainable AI
Lecture 10: Model Attack
Lecture 11: Domain Adaptation
Lecture 12: Network Compression
Lecture 13: Life-long Learning
Lecture 14: Meta Learning
Few-shot learning is usually achieved by meta-learning.
Machine Learning
Angdrew: 一种训练机器的隐式编程。
Hung-yi: Machine Learning
Different types of Functions
Regression
The func outputs a scalar.
Classification
Given options(classes), the func outputs the correct one.
Structured Learning
create something with structure(image, document)
Pipeline
Function with Unknown Parameters
Define Loss from Training Data
Loss is a func of parameters
例子:频道人数预测(用预测日期
预测误差,label 和 predic结果之差的一个函数
if
Optimization
这门课唯一涉及的方法: Gradient Descent
- (Randomly) Pick an initial value
- Compute
hyperparameter: the parameters given by human being. - Update
iteratively
Gradient Descent 有个问题,就是会收敛到 Local minima
对模型的修改,往往都来自,对于模型都理解(domain knowledge)
Neural Network
Linear models have severe limitation. Model Bias, so we need more sophisticated models.
Piecewise Linear Curves
Continuous curve can be approximated by a piecewise linear curve, need sufficient pieces.
Sigmoid
上面的蓝色Function,叫作 Hard Sigmoid
Vectorization
Unknown parameters:
update ML pipeline
- func with unknown
- Loss func
- Loss is a func of parameters
- Loss means how good a set of values is.
- Optimization
- (Randomly) Pick initial values
- compute gradient:
- compute gradient:
- compute gradient:
1 epoch = see all the batches once.
1 update = update the parameters once.
Rectified Linear Unit
Rectified Linear Unit(ReLU)
两个特定的ReLU可以生成一个 sigmoid
Sigmoid and ReLU r named as Activation func
linear | 10 ReLU | 100 ReLU | |
---|---|---|---|
2017-2020 | 0.32k | 0.32k | 0.28k |
2021 | 0.46k | 0.45k | 0.43k |
Multiple Layer
- Loss for multiple hidden layers
- 100 ReLU for each layer
- input features are the no. of views in the past 56 days
1 layer | 2 layer | 3 layer | |
---|---|---|---|
2017-2020 | 0.28k | 0.18k | 0.14k |
2021 | 0.43k | 0.39k | 0.38k |
Deep Learning
Deep Learning 可以替代 Feature Engineering
Fully Connect Feedforward Network
Given network structure, define a function set
当我们写 Neural Network 的式子的时候,一般会把它写成矩阵运算的形式,方便用GPU加速。
Hidden Layers are seen as Feature extractor replacing feature engineering.
一般在做 Neural Network 的时候,Output Layer will be Softmax to implement Multi-class Classifier.
select no. of layers
Q: How many layers? How many neurons for each layer?
Q: Can the structure be automatically determined?
- E.g. Evolutionary Artificial Neural Networks
Universality Theorem
Deep is better?
Any continuous function f
Can be realized by a network with one hidden layer (given enough hidden neurons)
Backpropagation
之前在CS229的时候,做过BP的推算,见文章
https://carp2i.github.io/2022/01/10/ML08/
forward pass
Compute
Backpropagation
Compute
Regression
Estimating the CP of a pokemon
CP: the Combat Power
Step1 Model
Linear model:
Step2 Goodness of Function
Training Data: 10pokemons
Loss func
Loss function是function的function
Input: a func, output: how bad it is
- Sum over examples
- Estimated
based on input function
Step3 Best Function
What we really care about is the error on new data(testing data)
Selecting another Model
Best Function
Testing:
If the initial function set perform badly, u should come back to step 1 to Redesign the Model
Redesign
如果你是大木博士的话,会有很多的domain knowlege,所以只能将所有参数都放入model
Regularization
Q: why smooth functions are preferred?
A: If some noises corrupt input
Training error: larger lambda, considering the training error less
We prefer smooth function, but don’t be too smooth.
Select
when u are preparing for regularization, the bias shouldn’t be taken into account.
Classification
Pokemon Classification
- Total:sum of all stats that come after this, a general guide to how strong a pokemon is
- HP: hit points, or health, defines how much damage a pokemon can withstand before fainting
- Attack: the base modifier for normal attacks(eg. Scratch, Punch)
- Defense: the base damage resistance against normal attacks
- SP Atk: special attack, the base modifier for special attacks(e.g. fire blast, bubble beam)
- SP Def: the base damage resistance against special attacks
- Speed: determines which pokemon attacks first each round
How to do Classification
- Training data for Classification
Classification as Regression?
Binary classification as example
Training: Class 1 means the target is 1; Class 2 means the target is -1
Testing: closer to 1class 1; closer to -1 class 2
用Regression的方法来做二元分类,会惩罚那些“too correct” result
Ideal Alternatives
Func(Model):
Loss functions:
The number of timesget incorrect results on training data. Find the best function:
- Example: perceptron, SVM (classic way)
Gaussian Distribution
一般认为 Pokemon 的属性值遵循正态分布(高斯分布
Input: vector x, output: probability of sampling x
The shape of the function determines by mean
Assume the points are sampled from a Gaussian distribution
Find the Gaussian distribution behind them
Maximum Likelihood
The Gaussian with any mean
Likelihood of a Gaussian with mean
We have 79 sample
We assume
If
做完发现,就算用了全部7个 features,预测的结果还是很差
Modifying Model
其实不常看到,每个func都有自己的 means
与 covariance
covariance matrix 的参数数量是 feature 数量的平方,较大的特征向量会使得 model的参数很多,更加容易 overfitting
- Maximum likelihood
Findmaximizing the likelihood
if the boundary is linear, we seem model as linear model
Recall
if u assume all the dimensions are independent then you are using Naive Bayes Classifier.
Posterior Probability
一波操作,最终化简结果:
In generative model, weestimate
- Post title: 01_Hung-yi Lee_Machine Learning
- Create time: 2022-03-18 19:07:24
- Post link: Machine-Learning/01-hung-yi-lee-machine-learning/
- Copyright notice: All articles in this blog are licensed under BY-NC-SA unless stating additionally.