Classification and Representation

Classification

Linear Regression with a threshold to divide the data in different class.
But the method doesn’t work well cuz classification isn’t actually a linear function

We usually denote the event happen as 1 (what we pay attention to), according to information theory, the unique event happen means lots of information trasfer.

Tumor: Benign （0）， Malignant （1）
Online Transaction: not fraudulent （0） ，fraudulent （1）

Hypothesis Representation

Different with Linear Model, we choose the hypothesis functionto satisfy.
This is accomplised by plugginginto the Logistic Function

Sigmoid Function

The functionmaps any real number to the (0, 1) interval, making it useful for transforming an arbitrary-valued function into a function better suited for classification.

will give us the probability that our output is 1.

Decision Boundary

To get discrete 0 or 1 classification:

Sigmoid’s feature

From these statements

The decision boundary is the line that separates the area where y = 0 and where y = 1. It is created by our hypothesis function.

Cost Function

the Logistic Function will cause the output to be wavy, causing many local optima.（it’s not a convex function)

Cost Function for logistic regression

Simplified Cost Function and Gradient Descent

We can compress our cost function’s two conditional cases into one case:

Entire Cost Function

Vectorized Implementation

Gradient Descent

the general form of gradient descent is:
Repeat{

}
work out the derivative part using calculus to get:
Repeat{

}
Notice that this algorithm is identical to the one we used in linear regression. We still have to simultaneously update all values in theta.

Vectorized Implementation

Advanced Optimization

“Conjugate gradient”, “BFGS”, and “L-BFGS” are more sophisticated, faster ways to optimizethat can be used instead of gradient descent.
Octave or Matlab have provided libraries.

First Step

provide a function that evaluates Cost function && Gradient:

Function Defination

function [jVal, gradient] = costFunction(theta)
  jVal = [...code to compute J(theta)...];
  gradient = [...code to compute derivative of J(theta)...];
end

Set the Options and optimization algorithm

1
2
3

options = optimset('GradObj', 'on', 'MaxIter', 100);
initialTheta = zeros(2,1);
   [optTheta, functionVal, exitFlag] = fminunc(@costFunction, initialTheta, options);