05_Hung-yi Lee_Convolutional Neural Networks(CNN)
Carpe Tu Black Whistle

CNN

一种常用的神经网络框架,被用于 图像识别,影像辨识。

Image Classification

图形预处理

  • 图像识别系统 处理的问题一般会有不同的scale,一般的做法是:
    将输入的,要处理的图像,~rescale~resize为统一的、要处理的图像大小。
    All the images to be classified have the same size.
  • Output 和 Label: one-hot vector
  • Criterion: Cross-entropy

image

图像数据组织

一般图像数据会被表示为(highwidthchannels)

image

Observation

  1. we needn’t identify the full picture.

    We just need identify some critical patterns.Some patterns are much smaller than the whole image.

  2. Some patterns appear in different regions
  3. Subsampling the pixels will not change the object(下采样不会改变图像表征的物体)

Simplification

Receptive field

感受野,是一个定义 Neuron(“神经元”) 感知范围的一个概念。每个神经元只感受自己的感受野的内容。

  • 一些精心设计的 Neural Network 会有 Cover only some channels 的设计(这种情况特殊处理,只有pattern出现在某些channel的情况下)
  • 最经典的 Receptive field 的安排:会看全部channel
  • cover all the channels 后,会感受野只剩下两个参数,high&width,这俩属性共同决定: kernel size
  • 一般一个 receptive field 会有一组 Neuron 来感知。

image

这里的几个定义的参数,其实也就是conv()函数的传入参数。

  1. kernel size、stride 会影响生成“图像”的纬度
  2. padding 关乎越界情况下的填充处理,只会影响生成的值

parameter sharing

不同位置,同样pattern的感知使用的参数不同。两个 Neuron 相同。

image

Two Neurons with the same receptive field has a set of neurons

image

  • Each receptive field has a set of neurons (e.g., 64 neurons)
  • Each receptive field has the neurons with the same set of parameters

Pooling(池化

解决图像的缩放的感知问题

Max Pooling

下采样过程中,感受野里选最大的

Mean Pooling

选均值

Convolutional Layers + Poooling

image

实践中,我们常常在 卷积层 后跟上一层 pooling,Pooling 做的事情就是把图片变小。

近两年的论文设计中,很多模型都抛弃了 Pooling(运算资源够多,支撑不做 pooling

但是在近期,图像识别的论文中,出现了大量 Full-convolution 的网络。

image
常见CNN

Benefit of Convolutional Layer

image

  • some patterns are much smaller than the whole image.
  • the same patterns appear in diff regions

from Filter

image

  1. The values in the filters are unknown parameters.
  2. the output of the filters are called Feature Map

Mutiple Convolutional Layers

image

image

the 2 version introduction

Neron VersionFilter Version
Each neuron only considers a receptive fieldThere are a set of filters detecting small patterns
The neurons with different receptive fields share the parameters.Each filter convolves over the input image.

They are same story.

Application:

Alpha Go

把棋盘当成图片

similarity between Go playing and Image

  • Some patterns are much smaller than the whole image

  • The same patterns appear in different regions

image

是否使用 Pooling 还得看 问题的性质是否适合池化,下围棋不可能下采样

More Applications

image

HW3: Image Classification

Objective

  1. Solve image classification with convolutional neural networks.
  2. Improve the performance with data augmentations.
  3. Understand popular image model techniques such as residual.

Tricks

Model Selection

  • visit torchvision.models for a list of model structures, or go to timm
  • If Pretrained weights are not allowed, specially set pretrained = False

Data Augmentation

  • Modify the image data so non-identical inputs are given to the model each epoch, to prevent overfitting of the model
  • Vist torchvision.transforms for a list of choices and their corresponding effect. Diversity is encouraged! Usually, stacking multiple transformations leads to better results.
  • Coding: fill in train_tfm to gain this effect
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# Normally, We don't need augmentations in testing and validation.
# All we need here is to resize the PIL image and transform it into Tensor.
test_tfm = transforms.Compose([
transforms.Resize((128, 128)),
transforms.ToTensor(),
])

# However, it is also possible to use augmentation in the testing phase.
# You may use train_tfm to produce a variety of images and then test using ensemble methods
train_tfm = transforms.Compose([
# Resize the image into a fixed shape (height = width = 128)
transforms.Resize((128, 128)),
# You may add some transforms here.
# ToTensor() should be the last one of the transforms.
transforms.ToTensor(),
])

Advanced Data Augmentation

mixup

image

Coding:

  • In your torch.utils.Dataset, getitem() needs to return an image that is the linear combination of two images/
  • In your torch.utils.Dataset, getitem() needs to return a label that is a vector, to assign probabilities to each class.
  • You need to explicitly code out the math formula of the cross entropy loss, as CrossEntropyLoss does not support multiple labels.

Test Time Augmentation

  • The sample code tests images using a deterministic “test transformation”
  • You may using the train transformation for a more diversified representation of the images, and predict with multiple variants of the test images.
  • Coding: You need to fill in train_tfm, change the augmentation method for test_dataset, and modify prediction code to gain this effect.

image

  • Usually, test_tfm will produce images that are more identifiable, so you can assign a larger weight to test_tfm results for better performance.

image

  • Ex: Final Prediction = avg_trian_tfm_pred * 0.5 + test_tfm_pred* 0.5

Cross Validation

image

  • Cross-validation is a resampling method that uses diff portions of the data to validate and train a model on different iterations. Ensembling multiple results lead to better performance.
  • Coding: You need to merge the current train and validation paths, and resample from those to form new train and validation sets.