02_Image Classification Pipeline
Carpe Tu Black Whistle

Administrative

Python & NumPy

all of the assignments are using Python and NumPy,
If u are not familiar with Python & NumPy, there is a tutorial https://cs231n.github.io/python-numpy-tutorial/

NumPy let u write very efficient vectorized operations that do a lot of computation in just a couple lines of code.

Google Cloud

Google Cloud offers the virtual machines with GPUs which can be used to accelerate the Machine Learning

Google Cloud Tutorial offered by Stanford https://github.com/cs231n/gcloud

Image Classification

– A core task in Computer Vision
Offical Notes: Image Classification

Semantic Gap

image

The computer really is representing the image as gigantic grid of numbers, e.g. the image might be something like 800 by 600 pixel and each pixel is represented by 3 numbers, giving the red, green, blue values for that pixel.

Semantic Gap: The idea of the cat, or this label of cat, is a semantic label that we’re assigning to the image, and there’s this huge gap between the semantic idea of a cat and these pixel values that the computer is actually seeing.

Challenges

  • Viewpoint variation
    image
  • Illumination
    image
  • Deform
    image
  • Occlusion
    image

Formation of Classifer

1
2
3
def classify_image(image):
# some magic here?
return class_label

Unlike e.g. sorting a list of numbers.
no obvious way to hard-core the algorithm for recognizing a cat, or other classes.

Data-Driven Approach

  1. Collect a dataset of images and labels
  2. Use Machine Learning to train a classifier
  3. Evaluate the classifier on new images
1
2
3
4
5
6
def train(images, labels):
# Machine Learning!
return model
def predict(model, test_images):
# Use model to predict labels
return test_labels

KNN

Example Dataset: CIFAR10

  • 10 classes
  • 50,000 training images
  • 10,000 testing images

image
The picture in the right side are test image and nearest neighbor

Nearest Neighbor

image

  • train function: memorize training data
  • predict functions: For each test image
    • Find closest train image
    • Predict label of nearest image
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
import numpy as np

class NearestNeighbor:
def __init__(self):
pass

def train(self, X, y):
""" X is N x D where each row is an example. Y is 1-dimension of size N """
# the nearest neighbor classifier simply remenbers all the training data
self.Xtr = X
self.ytr = y

def predict(self, X):
""" X is N xD where each row is an example we wish to predict label for """
num_text = X.shape[0]
# lets make sure that the output type matches the input type
Ypred = np.zeros(num_test, dtype = self.ytr.dtype)

# loop over all test rows
for i in xrange(num_test):
# find the nearest training image to the i'th test image
# using the L1 distance (sum of absolute value differences)
distances = np.sum(np.abs(self.Xtr - X[i,:]), axis = 1)
min_index = np.argmin(distances) # get the index with smallest distance
Ypred[i] = self.ytr[min_index] # predict the label of the nearest example

return Ypred

Complexity

Q: With N examples, how fast are training and prediction?

A: Train, Predict

This is bad, the algorithm is expected fast at prediction; slow for training is acceptable.

K-Nearest Neighbors

The link below here, is the visualization website designed by the TA(Justin Johnson) of CS231n
http://vision.stanford.edu/teaching/cs231n-demos/knn/

Instead of copying label from nearest neighbor, take majority vote from K closest points.

image

a higher value of k have a smoothing effect that makes the classifier more resistant to outliers.

Note that: the blank regions in the 3-NN or 5-NN images are caused by ties in the votes among the nearest neighbors

Distance Metric

  • L1(Manhattan) distance (left)

  • L2(Euclidean) distance (right)

    image
    The Manhatten Distance is coordinate axis relatived

KNN’s shortcoming

k-Nearest Neighborr on images never used

– Very slow at test time
– Distance metrics on pixels are not informative
Curse of dimensionality

As an aside, the computational complexity of the Nearest Neighbor classifier is an active area of research, and several Approximate Nearest Neighbor (ANN) algorithms and libraries exist that can accelerate the nearest neighbor lookup in a dataset (e.g. FLANN).

KNN in practice

If you wish to apply kNN in practice (hopefully not on images, or perhaps as only a baseline) proceed as follow:

  1. Preprocess your data: Normalize the features in your data (e.g. one pixel in images) to have zero mean and unit variance. We will cover this in more detail in later sections, and chose not to cover data normalization in this section because pixels in images are usually homogeneous and do not exhibit widely different distributions, alleviating the need for data normalization.
  2. If your data is very high-dimensional, consider using a dimensionality reduction technique such as PCA, NCA, or even Random Projections.
  3. Split your training data randomly into train/val splits. As a rule of thumb, between 70-90% of your data usually goes to the train split. This setting depends on how many hyperparameters you have and how much of an influence you expect them to have. If there are many hyperparameters to estimate, you should err on the side of having larger validation set to estimate them effectively. If you are concerned about the size of your validation data, it is best to split the training data into folds and perform cross-validation. If you can afford the computational budget it is always safer to go with cross-validation (the more folds the better, but more expensive).
  4. Train and evaluate the kNN classifier on the validation data (for all folds, if doing cross-validation) for many choices of k (e.g. the more the better) and across different distance types (L1 and L2 are good candidates)
  5. If your kNN classifier is running too long, consider using an Approximate Nearest Neighbor library (e.g. FLANN) to accelerate the retrieval (at cost of some accuracy).
  6. Take note of the hyperparameters that gave the best results. There is a question of whether you should use the full training set with the best hyperparameters, since the optimal hyperparameters might change if you were to fold the validation data into your training set (since the size of the data would be larger). In practice it is cleaner to not use the validation data in the final classifier and consider it to be burned on estimating the hyperparameters. Evaluate the best model on the test set. Report the test set accuracy and declare the result to be the performance of the kNN classifier on your data.

Hyperparameter and Validation

What is the best value of k to use ?
What is the best distance metrics to use ?

These are hyperparameters: choices about the algorithm that we set rather than learn

Very problem-dependent.
Must try them all out and see what works best.

Setting Hyperparameters

image

K-Cross Validation

Cross-Validation: Split data into folds, try each fold as validation and average the results

image

Useful for small datasets, but not used too frequently in deep learning

  • The accuracy on validation set is always lower than in test dataset, that’s because validation can be thought using to fit the hyperparameter(or select the best combination of hyperparameters)

Linear Regression

Neural Network likes a building, and Linear Regression likes the building blocks.

Parameter Approach: Linear Classifier

image

Example with an image with 4 pixels, and 3 classes(cat/dog/ship)
image

Hard Examples for a Linear Classifier

image

Coming up

– Loss function (quantfilying what it means to have a “good”)
– Optimization (start with randomand find athat minimizes the loss)
ConvNets! (tweak the functional form of f)