01_Hung-yi Lee_Machine Learning
Carpe Tu Black Whistle

写在前面

之前已经做过 Andrew Ng 的 CS229了,但是那门课,已经算是比较古早了(2016),课程的作业设计虽然做的非常精致,但基于Matlab语言编写的。现在主流的科研和开发语言,已经是python了。还是离实际算法落地,有一段距离。 CS229(2016),更多是练习写 toy model,但过了一遍一些机器学习的理论,感觉还是很不错。这门课就当,强化巩固。

This course focuses on Deep Learning

课程网站: ML 2022 spring
课程Github:https://github.com/virginiakm1988/ML2022-Spring
The repository contains code and slides of 15 homeworks for Machine Learning instructed by Hung-yi Lee.
Prof. Lee 把15节课程的录播都已经上传,在油管上会周更一些新的内容。这边基于2022版给予的课程录像,进行学习。加油。

Intro of the courses

HW1: COVID-19 Case Prediction
HW2: Phoneme Classification
HW3: Image Classification
HW4: Speaker Classification
HW5: Machine Translation
HW6: Anime Face Generation

image

Lecture 7: Self-supervised Learning
image

用引擎上爬下来的众多unlabel的图片来 pretrain,来获得更好的训练效果

Pre-trained Model(aka. Foundation Model) vs Downstream Tasks

image

Lecture 6: GAN

image

Lecture 12: Reinforcement Learning (RL

Lecture 8: Anomaly Detection

Lecture 9: Explainable AI

image

Lecture 10: Model Attack

Lecture 11: Domain Adaptation

Lecture 12: Network Compression

image

Lecture 13: Life-long Learning

Lecture 14: Meta Learning

image

Few-shot learning is usually achieved by meta-learning.

Machine Learning

Angdrew: 一种训练机器的隐式编程。
Hung-yi: Machine LearningLooking for Function.

Different types of Functions

Regression

The func outputs a scalar.

Classification

Given options(classes), the func outputs the correct one.

Structured Learning

create something with structure(image, document)

Pipeline

Function with Unknown Parameters


: no. of views on 2/26,: no. of views on 2/25
andare unknown parameters(learned from data) weight and bias
vector of features.

Define Loss from Training Data

Loss is a func of parameters, to measure how good a set of values is.


例子:频道人数预测(用预测日期前的流量数据作为输入
预测误差,label 和 predic结果之差的一个函数



ifandare both probability distributionsCross-entropy

Optimization

这门课唯一涉及的方法: Gradient Descent

  • (Randomly) Pick an initial value
  • Compute

    hyperparameter: the parameters given by human being.
  • Updateiteratively

Gradient Descent 有个问题,就是会收敛到 Local minima

image

对模型的修改,往往都来自,对于模型都理解(domain knowledge)

Neural Network

Linear models have severe limitation. Model Bias, so we need more sophisticated models.

Piecewise Linear Curves

image

Continuous curve can be approximated by a piecewise linear curve, need sufficient pieces.

image

Sigmoid

image

上面的蓝色Function,叫作 Hard Sigmoid
image


image

Vectorization


: feature
Unknown parameters:

update ML pipeline

  1. func with unknown
  2. Loss func
  • Loss is a func of parameters
  • Loss means how good a set of values is.

image

  1. Optimization
  • (Randomly) Pick initial values
  • compute gradient:
    image
  • compute gradient:
  • compute gradient:

is the Loss func computed from 1st Batch, and then update the parameter.

1 epoch = see all the batches once.
1 update = update the parameters once.

Rectified Linear Unit

Rectified Linear Unit(ReLU)

两个特定的ReLU可以生成一个 sigmoid


Sigmoid and ReLU r named as Activation func

linear10 ReLU100 ReLU
2017-20200.32k0.32k0.28k
20210.46k0.45k0.43k

Multiple Layer

image

  • Loss for multiple hidden layers
    • 100 ReLU for each layer
    • input features are the no. of views in the past 56 days
1 layer2 layer3 layer
2017-20200.28k0.18k0.14k
20210.43k0.39k0.38k

Deep Learning

Deep Learning 可以替代 Feature Engineering

Fully Connect Feedforward Network

image
Given network structure, define a function set

image

当我们写 Neural Network 的式子的时候,一般会把它写成矩阵运算的形式,方便用GPU加速。

Hidden Layers are seen as Feature extractor replacing feature engineering.

一般在做 Neural Network 的时候,Output Layer will be Softmax to implement Multi-class Classifier.

select no. of layers

  • Q: How many layers? How many neurons for each layer?

  • Q: Can the structure be automatically determined?

    • E.g. Evolutionary Artificial Neural Networks

Universality Theorem

Deep is better?
Any continuous function f

Can be realized by a network with one hidden layer (given enough hidden neurons)

Backpropagation

之前在CS229的时候,做过BP的推算,见文章
https://carp2i.github.io/2022/01/10/ML08/

forward pass

Computefor all parameters
image

Backpropagation

Computefor all activation func inputs z

image

image

image

image

Regression

Estimating the CP of a pokemon

CP: the Combat Power

image

Step1 Model


Linear model:

Step2 Goodness of Function

image

Training Data: 10pokemons

Loss func:
Loss function是function的function
Input: a func, output: how bad it is

  1. Sum over examples
  2. Estimatedbased on input function

Step3 Best Function

image
What we really care about is the error on new data(testing data)
image

Selecting another Model

Best Function


Testing:

image

If the initial function set perform badly, u should come back to step 1 to Redesign the Model

Redesign

image

如果你是大木博士的话,会有很多的domain knowlege,所以只能将所有参数都放入model

Regularization


Q: why smooth functions are preferred?
A: If some noises corrupt inputwhen testing, a smoother function has less influence.

Training error: larger lambda, considering the training error less
We prefer smooth function, but don’t be too smooth.

Selectobtaining the best function
when u are preparing for regularization, the bias shouldn’t be taken into account.

Classification

Pokemon Classification

image

  • Total:sum of all stats that come after this, a general guide to how strong a pokemon is
  • HP: hit points, or health, defines how much damage a pokemon can withstand before fainting
  • Attack: the base modifier for normal attacks(eg. Scratch, Punch)
  • Defense: the base damage resistance against normal attacks
  • SP Atk: special attack, the base modifier for special attacks(e.g. fire blast, bubble beam)
  • SP Def: the base damage resistance against special attacks
  • Speed: determines which pokemon attacks first each round

How to do Classification

  • Training data for Classification
    image
    Classification as Regression?
    Binary classification as example
    Training: Class 1 means the target is 1; Class 2 means the target is -1
    Testing: closer to 1class 1; closer to -1class 2

用Regression的方法来做二元分类,会惩罚那些“too correct” result

Ideal Alternatives

  • Func(Model):

  • Loss functions:

    The number of timesget incorrect results on training data.

  • Find the best function:

    • Example: perceptron, SVM (classic way)

Gaussian Distribution

一般认为 Pokemon 的属性值遵循正态分布(高斯分布


Input: vector x, output: probability of sampling x
The shape of the function determines by meanand covariance matrix

Assume the points are sampled from a Gaussian distribution
Find the Gaussian distribution behind themProbability for new points

image

Maximum Likelihood

image

The Gaussian with any meanand covariance matrixcan generate these points.Different Likelihood

Likelihood of a Gaussian with meanand covariance matrix


We have 79 sample
We assumegenerate from the Gaussianwith the maximum likelihood


If

image

image

做完发现,就算用了全部7个 features,预测的结果还是很差

Modifying Model

其实不常看到,每个func都有自己的 means与 covariance
covariance matrix 的参数数量是 feature 数量的平方,较大的特征向量会使得 model的参数很多,更加容易 overfitting

  • Maximum likelihood
    image
    Findmaximizing the likelihood

&is the same

image
if the boundary is linear, we seem model as linear model

Recall

image

image

if u assume all the dimensions are independent then you are using Naive Bayes Classifier.

Posterior Probability

image

image

一波操作,最终化简结果:



In generative model, weestimate