01_Introduction to Convolutional Neural Networks for Visual Recognition
Carpe Tu Black Whistle

Welcome to CS231n

the class CS231n is really about computer vision which is really study of visual data.

  • the majority of bits flying around the Internet are visual data.
  • this field of computer vision is truly interdisciplinary field, and it touches on many different areas of science and engineering and technology.
  • CS131(Fall 2016, Profs. Fei-Fei Li & Juan Carlos Niebles)
    – Undergraduate introductory class

  • CS231a(Spring 2017, Prof. Silvio Savarese)
    – Core computer vision class for seniors, matters, and PhDs
    – Topics include image processing, cameras, 3D reconstruction, segmentation, object recognition, scene understanding

  • CS231n(Prof. Fei-Fei Li & Justin Johnson & Serena Yeung)
    – Neural network (aka “deep learning”) class on image classification

And an assortment of CS331 and CS431 for advanced topics in computer vision.

History of the computer vision

70s 80s’ attempts & toy examples

object recognition

  • camera obscura

    • which is a camera based on pinhole camera theories.
    • 1600s, the Renaissance period of time
    • it’s very similiar early eyes that animals developed with a hole that collects lights。
  • In the mean time biologists started mechanism of vision.

    • The work done by Hubel an Wiesel in the 50s and 60s using electrophysiology
      image
  • Larry Roberts published a set of work called Block World.

    • The history of computer vision also starts around early 60s.

      That’s the first PhD thesis of computer vision where the visual world was simplified into simple geometric shapes and the goal is to be able to reconginze them and reconstruct what these shapes are.

  • “The Summer Vision Project” (famous MIT summer project)

    • an attempt to use summer workers effectively in a construction of a significant part of a visual system
    • the field of computer vision has blossomed from one summer project into field of thousands of researchers still working on some of the most fundamental problems of vision
  • Vision, David Marr, in the late 70s

    • in order to take a image and arrive at a final holistic full 3d representation of the visual world

image

image

  • generalized cylinder & pictorial structure
    image
    the basic idea is that every object is composed of simple geometric primitives
    either representation is a way to reduce the complex structure of the object into a collection of simpler shapes

  • David Lowe

    • try to recognize razors by constructing lines and edges and mostly straight lines and their combination.

image

object segmentation

Normalized Cut(Shi & Malik, 1997)
image

The task is taking an image and group the pixels into meaningful areas.
here’s one very early seminal work by Jitendra Malik and his student Jianbo Shi from Berkeley.
Using a graph theory algorithm for the problem of image segmentation.

After 2000

face detection

around 1999 to 2000 machine learning techniques, especially statistical machine learning techniques start to gain momentum. E.g. SVM, boosting, graphical models, the first wave of the neural network.

One Particular work that made lots of contribution —- AdaBoost algorithm to do real-time face detection by Paul Viola and Michael Jones

image

SIFT feature

by David Lowe, 1999
The idea is that to match and the entire object to another one.

Spatial Pyramid Matching

Lazeblink, Schmid & Ponce, 2006

image

The idea is that there are features in the images that can give us clues about which type of scene it is, whether it’s a landscape or a kitchen or a highway and so on and this particular work takes these features from different parts of image and in different resolutions and put them together in a feature discriptor and than we do support vector machine algorithm on top of that.

Before ImageNet

In the early 2000s, we began to have benchmark data set that can enable us to measure the progress of recognition.

image
PASCAL Visual Object Challenge(20 object categories)

ImageNet

www.image-net.org.

  1. just want to recognize the world of all the objects
  2. to come back the machine learning overcome the machine learning bottleneck of overfitting.

    The part of the problem is the visual data is very complex, because it’s complex, our model tend to have a high dimension. High dimension of inputs and have a lot of parameter to fit and when we don’t have enough training data overfitting happens very fast and then we cannot generalize very well.

image

Scientists group from Princeton to Stanford put together the largest possible dataset.

ImageNet Large-Scale Visual Recognition Challenge

Begin from 2009
image

image

There is a huge gap between 2011 and 2012

CS231n Overview

CS231n focuses on one of the most important problems of visual recognition – image classification.

There is a number of visual recognition problems that are related to image classification, such as object detection, image captioning

image

Convolutional Neural Network(CNN) have become an important tool for object recognition. CNN somtimes called convnets.

image

The main takeaway here is that convolutional neural networks really had this breakthrough moment in 2012, and since then there’s been a lot of effort focused in tuning and tweeking these algorithms to make them perform better and better on this problem of image classification.

Even though CNNs perform well in the ImageNet challenges, it’s not invented overnight.

image

Philosophy

u should really understand the deep mechanics of all of these algorithm

  • Thorough and Detailed/
    – Understand how to write from scratch, debug and train CNNs.
  • Practical.
    – Focus on practical techniques for training these networks at scale, and on GPUs(e.g. will touch on distributed optimization, differences between CPU vs. GPU, etc.) Also look at state of the art software tools such as Caffe, TensorFlow, and (Py)Torch
  • State of the art.
    – Most materials are new from research world in the past 1-3 years(2014->2016). Very exciting stuff!
  • Fun.
    – Some fun topics sunch as Image Captioning(using RNN)
    – Also DeepDream, NeuralStyle, etc.