Welcome to CS231n
the class CS231n is really about computer vision which is really study of visual data.
- the majority of bits flying around the Internet are visual data.
- this field of computer vision is truly interdisciplinary field, and it touches on many different areas of science and engineering and technology.
Related Courses @ Stanford
CS131(Fall 2016, Profs. Fei-Fei Li & Juan Carlos Niebles)
– Undergraduate introductory classCS231a(Spring 2017, Prof. Silvio Savarese)
– Core computer vision class for seniors, matters, and PhDs
– Topics include image processing, cameras, 3D reconstruction, segmentation, object recognition, scene understandingCS231n(Prof. Fei-Fei Li & Justin Johnson & Serena Yeung)
– Neural network (aka “deep learning”) class on image classification
And an assortment of CS331 and CS431 for advanced topics in computer vision.
History of the computer vision
70s 80s’ attempts & toy examples
object recognition
camera obscura
- which is a camera based on pinhole camera theories.
- 1600s, the Renaissance period of time
- it’s very similiar early eyes that animals developed with a hole that collects lights。
In the mean time biologists started mechanism of vision.
- The work done by Hubel an Wiesel in the 50s and 60s using electrophysiology
- The work done by Hubel an Wiesel in the 50s and 60s using electrophysiology
Larry Roberts published a set of work called Block World.
- The history of computer vision also starts around early 60s.
That’s the first PhD thesis of computer vision where the visual world was simplified into simple geometric shapes and the goal is to be able to reconginze them and reconstruct what these shapes are.
- The history of computer vision also starts around early 60s.
“The Summer Vision Project” (famous MIT summer project)
- an attempt to use summer workers effectively in a construction of a significant part of a visual system
- the field of computer vision has blossomed from one summer project into field of thousands of researchers still working on some of the most fundamental problems of vision
Vision, David Marr, in the late 70s
- in order to take a image and arrive at a final holistic full 3d representation of the visual world
generalized cylinder & pictorial structure
the basic idea is that every object is composed of simple geometric primitives
either representation is a way to reduce the complex structure of the object into a collection of simpler shapesDavid Lowe
- try to recognize razors by constructing lines and edges and mostly straight lines and their combination.
object segmentation
Normalized Cut(Shi & Malik, 1997)
The task is taking an image and group the pixels into meaningful areas.
here’s one very early seminal work by Jitendra Malik and his student Jianbo Shi from Berkeley.
Using a graph theory algorithm for the problem of image segmentation.
After 2000
face detection
around 1999 to 2000 machine learning techniques, especially statistical machine learning techniques start to gain momentum. E.g. SVM, boosting, graphical models, the first wave of the neural network.
One Particular work that made lots of contribution —- AdaBoost algorithm to do real-time face detection by Paul Viola and Michael Jones
SIFT feature
by David Lowe, 1999
The idea is that to match and the entire object to another one.
Spatial Pyramid Matching
Lazeblink, Schmid & Ponce, 2006
The idea is that there are features in the images that can give us clues about which type of scene it is, whether it’s a landscape or a kitchen or a highway and so on and this particular work takes these features from different parts of image and in different resolutions and put them together in a feature discriptor and than we do support vector machine algorithm on top of that.
Before ImageNet
In the early 2000s, we began to have benchmark data set that can enable us to measure the progress of recognition.
PASCAL Visual Object Challenge(20 object categories)
ImageNet
- just want to recognize the world of all the objects
- to come back the machine learning overcome the machine learning bottleneck of overfitting.
The part of the problem is the visual data is very complex, because it’s complex, our model tend to have a high dimension. High dimension of inputs and have a lot of parameter to fit and when we don’t have enough training data overfitting happens very fast and then we cannot generalize very well.
Scientists group from Princeton to Stanford put together the largest possible dataset.
ImageNet Large-Scale Visual Recognition Challenge
Begin from 2009
There is a huge gap between 2011 and 2012
CS231n Overview
CS231n focuses on one of the most important problems of visual recognition – image classification.
There is a number of visual recognition problems that are related to image classification, such as object detection, image captioning
Convolutional Neural Network(CNN) have become an important tool for object recognition. CNN somtimes called convnets.
The main takeaway here is that convolutional neural networks really had this breakthrough moment in 2012, and since then there’s been a lot of effort focused in tuning and tweeking these algorithms to make them perform better and better on this problem of image classification.
Even though CNNs perform well in the ImageNet challenges, it’s not invented overnight.
Philosophy
u should really understand the deep mechanics of all of these algorithm
- Thorough and Detailed/
– Understand how to write from scratch, debug and train CNNs. - Practical.
– Focus on practical techniques for training these networks at scale, and on GPUs(e.g. will touch on distributed optimization, differences between CPU vs. GPU, etc.) Also look at state of the art software tools such as Caffe, TensorFlow, and (Py)Torch - State of the art.
– Most materials are new from research world in the past 1-3 years(2014->2016). Very exciting stuff! - Fun.
– Some fun topics sunch as Image Captioning(using RNN)
– Also DeepDream, NeuralStyle, etc.
- Post title: 01_Introduction to Convolutional Neural Networks for Visual Recognition
- Create time: 2022-03-01 18:50:55
- Post link: Computer-Vision/01-introduction-to-convolutional-neural-networks-for-visual-recognition/
- Copyright notice: All articles in this blog are licensed under BY-NC-SA unless stating additionally.