Clustering
given unlabeled dataset and we would like to have an algorithm automatically group the data into coherent subsets
K-means
K-means is an iterative algorithm and it does two things
- First, is a cluster assignment step.
- Second is a moving centroid step.
centroid: n. (数)形心
K-means algorithm is by far the most popular by far the most widely used clustering algorithm.
Algorithm
Input:
{ }
Randomly initialize
Repeat{
}
Note
- For the First Step, assign the data into
group according the distance to centroid. - The other part(Step 2), move the centroid step. If there is a cluster centroid with zero points to it . In that case, the most common thing to do is just eliminate if u can accept a
dim output - Otherwise, randomly reinitidized.
Optimization Objective
definition
Optimization objective
the optimization objective is also called distortion function or the distortion of the K-means algorithm
Randomly Initialize
- Should have
- Randomly pick
training examples - Set
equal to these examples.
To avoid K-means has gotten stuck to the local optima, the solution is trying multiple, randomly initializations.(run hundred times)
For
}
Pick clustering that gave lowest cost
Choosing K
Elbow method
- Vary K(from 1) and compute the distortion J, and choose the turning point(like humans’ elbow)
- There is a situation with no clear elbow.
Elbow method is worth the shot but not necessarily have a very high expection of its working for any particular problem.
Other way
Thinking about the purposed of downstream, or other way to set the parameter K.
Dimensionality Reduction
There are two motivations for Dimensionality Reduction
data compression
- use up less computer memory.
- speed up running time of learning algorithm.
Visualization
Reduce from n-dimension to k-dimension: Find
Principal Component Analysis (PCA)
by far the most popular algorithm. This section will present the problem formulation for PCA
What PCA does formally is it tries to find a lower dimensional surface.
The goal of PCA is finding a direction(a vector (
Difference with Linear Regression
- for linear regression, sum the vertical error
- But PCA, accumlate the distance to the line.
Algorithm
Training set:
Preprocessing(feature scaling/mean normalization):
Replace each
If different features on different scales(e.g.
Main process
Reduce data from
Compute “covariance matrix”:
Compute “eigenvectors“ of matrix
The plan B to compute the eigenvectors
1 | eig(Sigma) |
Summary code
1 | Sigma = (1/m) * X' * X; |
Reconstruction from PCA
Choosing k ( No.of principal components)
Average squared projection error:
Total variation in the data:
Typically, choose
“99% of variance is retained”
1 | [U, S, V] = svd(Sigma) |
Pick the smallest value of
(99% of variance is retained)
Advice
Note:
the Mapping
Bad use of PCA
To prevent overfitting
Use
Thus, fewer features, less likely to overfit.
This might work, but isn’t a good way to address overfitting.
Use regularization instead:
- Post title: 13_K-means and Principal Component Analysis
- Create time: 2022-02-20 20:50:25
- Post link: Machine-Learning/13-k-means-and-principal-component-analysis/
- Copyright notice: All articles in this blog are licensed under BY-NC-SA unless stating additionally.