13_K-means and Principal Component Analysis

Clustering

given unlabeled dataset and we would like to have an algorithm automatically group the data into coherent subsets

K-means

K-means is an iterative algorithm and it does two things

First, is a cluster assignment step.
Second is a moving centroid step.

centroid: n. (数)形心

K-means algorithm is by far the most popular by far the most widely used clustering algorithm.

Algorithm

Input:

{}

Randomly initializecluster centroids
Repeat{
forto
index(from 1 to) of cluster centroid
closest to
forto
average(mean) of points assigned to cluster
}

Note

For the First Step, assign the data intogroup according the distance to centroid.
The other part(Step 2), move the centroid step. If there is a cluster centroid with zero points to it . In that case, the most common thing to do is just eliminate if u can accept adim output
Otherwise, randomly reinitidized.

Optimization Objective

definition

Optimization objective

the optimization objective is also called distortion function or the distortion of the K-means algorithm

Randomly Initialize

Should have
Randomly picktraining examples
Setequal to theseexamples.

To avoid K-means has gotten stuck to the local optima, the solution is trying multiple, randomly initializations.(run hundred times)

For1 to 100{

.

}

Pick clustering that gave lowest cost

Choosing K

Elbow method

Vary K(from 1) and compute the distortion J, and choose the turning point(like humans’ elbow)
There is a situation with no clear elbow.

Elbow method is worth the shot but not necessarily have a very high expection of its working for any particular problem.

Other way

Thinking about the purposed of downstream, or other way to set the parameter K.

Dimensionality Reduction

There are two motivations for Dimensionality Reduction

data compression

use up less computer memory.
speed up running time of learning algorithm.

Visualization

Reduce from n-dimension to k-dimension: Findvectorsonto which to project the data, so as to minimize the projection error.

Principal Component Analysis (PCA)

by far the most popular algorithm. This section will present the problem formulation for PCA

What PCA does formally is it tries to find a lower dimensional surface.

The goal of PCA is finding a direction(a vector () onto which to project the data so as to minimize the projection error.

Difference with Linear Regression

for linear regression, sum the vertical error
But PCA, accumlate the distance to the line.

Algorithm

Training set:

Preprocessing(feature scaling/mean normalization):

Replace eachwith
If different features on different scales(e.g.), scaling features to have comparable range of values.

Main process

Reduce data fromdimensions todimentsions
Compute “covariance matrix”:

Compute “eigenvectors“ of matrix:

The plan B to compute the eigenvectors

1	eig(Sigma)

Summary code

Sigma = (1/m) * X' * X;
[U,S,V] = svd(Sigma);
Ureduce = U(:, 1:k);
z = Ureduce'*x;

Reconstruction from PCA

Choosing k ( No.of principal components)

Average squared projection error:
Total variation in the data:

Typically, chooseto be the smallest value so that

“99% of variance is retained”

1	[U, S, V] = svd(Sigma)

Pick the smallest value offor which

(99% of variance is retained)

Advice

Note:

the Mappingshould be defined by running PCA only on training set. This mapping can be applied as well to examplesandin the cross validation and test sets.

Bad use of PCA

To prevent overfitting
Useinstead ofto reduce the number of features to.
Thus, fewer features, less likely to overfit.

This might work, but isn’t a good way to address overfitting.
Use regularization instead: