05_Hung-yi Lee_Convolutional Neural Networks(CNN)
CNN
一种常用的神经网络框架,被用于 图像识别,影像辨识。
Image Classification
图形预处理
- 图像识别系统 处理的问题一般会有不同的scale,一般的做法是:
将输入的,要处理的图像,~rescale~resize为统一的、要处理的图像大小。
All the images to be classified have the same size. - Output 和 Label: one-hot vector
- Criterion: Cross-entropy
图像数据组织
一般图像数据会被表示为(high
Observation
- we needn’t identify the full picture.
We just need identify some critical patterns.Some patterns are much smaller than the whole image.
- Some patterns appear in different regions
- Subsampling the pixels will not change the object(下采样不会改变图像表征的物体)
Simplification
Receptive field
感受野,是一个定义 Neuron(“神经元”) 感知范围的一个概念。每个神经元只感受自己的感受野的内容。
- 一些精心设计的 Neural Network 会有 Cover only some channels 的设计(这种情况特殊处理,只有pattern出现在某些channel的情况下)
- 最经典的 Receptive field 的安排:会看全部channel
- cover all the channels 后,会感受野只剩下两个参数,high&width,这俩属性共同决定: kernel size
- 一般一个 receptive field 会有一组 Neuron 来感知。
这里的几个定义的参数,其实也就是conv()函数的传入参数。
- kernel size、stride 会影响生成“图像”的纬度
- padding 关乎越界情况下的填充处理,只会影响生成的值
parameter sharing
不同位置,同样pattern的感知使用的参数不同。两个 Neuron 相同。
Two Neurons with the same receptive field has a set of neurons
- Each receptive field has a set of neurons (e.g., 64 neurons)
- Each receptive field has the neurons with the same set of parameters
Pooling(池化
解决图像的缩放的感知问题
Max Pooling
下采样过程中,感受野里选最大的
Mean Pooling
选均值
Convolutional Layers + Poooling
实践中,我们常常在 卷积层 后跟上一层 pooling,Pooling 做的事情就是把图片变小。
近两年的论文设计中,很多模型都抛弃了 Pooling(运算资源够多,支撑不做 pooling
但是在近期,图像识别的论文中,出现了大量 Full-convolution 的网络。
常见CNN
Benefit of Convolutional Layer
- some patterns are much smaller than the whole image.
- the same patterns appear in diff regions
from Filter
- The values in the filters are unknown parameters.
- the output of the filters are called Feature Map
Mutiple Convolutional Layers
the 2 version introduction
Neron Version | Filter Version |
---|---|
Each neuron only considers a receptive field | There are a set of filters detecting small patterns |
The neurons with different receptive fields share the parameters. | Each filter convolves over the input image. |
They are same story.
Application:
Alpha Go
把棋盘当成图片
similarity between Go playing and Image
Some patterns are much smaller than the whole image
The same patterns appear in different regions
是否使用 Pooling 还得看 问题的性质是否适合池化,下围棋不可能下采样
More Applications
HW3: Image Classification
Objective
- Solve image classification with convolutional neural networks.
- Improve the performance with data augmentations.
- Understand popular image model techniques such as residual.
Tricks
Model Selection
- visit torchvision.models for a list of model structures, or go to timm
- If Pretrained weights are not allowed, specially set pretrained = False
Data Augmentation
- Modify the image data so non-identical inputs are given to the model each epoch, to prevent overfitting of the model
- Vist torchvision.transforms for a list of choices and their corresponding effect. Diversity is encouraged! Usually, stacking multiple transformations leads to better results.
- Coding: fill in train_tfm to gain this effect
1 | # Normally, We don't need augmentations in testing and validation. |
Advanced Data Augmentation
Coding:
- In your torch.utils.Dataset, getitem() needs to return an image that is the linear combination of two images/
- In your torch.utils.Dataset, getitem() needs to return a label that is a vector, to assign probabilities to each class.
- You need to explicitly code out the math formula of the cross entropy loss, as CrossEntropyLoss does not support multiple labels.
Test Time Augmentation
- The sample code tests images using a deterministic “test transformation”
- You may using the train transformation for a more diversified representation of the images, and predict with multiple variants of the test images.
- Coding: You need to fill in train_tfm, change the augmentation method for test_dataset, and modify prediction code to gain this effect.
- Usually, test_tfm will produce images that are more identifiable, so you can assign a larger weight to test_tfm results for better performance.
- Ex: Final Prediction = avg_trian_tfm_pred * 0.5 + test_tfm_pred* 0.5
Cross Validation
- Cross-validation is a resampling method that uses diff portions of the data to validate and train a model on different iterations. Ensembling multiple results lead to better performance.
- Coding: You need to merge the current train and validation paths, and resample from those to form new train and validation sets.
- Post title: 05_Hung-yi Lee_Convolutional Neural Networks(CNN)
- Create time: 2022-04-05 12:53:29
- Post link: Machine-Learning/05-hung-yi-lee-convolutional-neural-networks-cnn/
- Copyright notice: All articles in this blog are licensed under BY-NC-SA unless stating additionally.