Videos Link

PyTorch_Tutorial1 https://www.youtube.com/watch?v=85uJ9hSaXig
PyTorch_Tutorial2 https://www.youtube.com/watch?v=VbqNn20FoHM

Outline

Background: Prerequisites & What is Pytorch?
Training & Testing Neural Networks in Pytorch
Dataset & Dataloader
Tensors
torch.nn; Models, Loss Functions
torch.optim: Optimization
Save/load models

Tutorial1

Prerequisites

We assume u r already familiar with…

Python3
if-else, loop, function, file IO, class,
Deep Learning Basics
Prof. Lee’s 1st & 2nd lecture videos.

Some knowledge of NumPy will also be useful!

PyTorch

An machine learningframework in Python
Two main features:
- N-dimensional Tensor computation(like NumPy) on GPUS
- Automatic differentiation for training deep neural networks

Training Neural Networks

Step1–Load dataset

Dataset: stores data samples and expected values
Dataloader: groups data in batches, enables multiprocessing

1 2	dataset = MyDataset(file) dataloader = DataLoader(dataset, batch_size, shuffle=True)

Dataset & Dataloader

from torch.utils.data import Dataset, DataLoader

class MyDataset(Dataset):
    def __init__(self, file):
        self.data = ...
    # Read data & preprocess
    def __getitem__(self, index):
        return self.data[index]
    # Returns one sample at a time
    def __len__(self):
        return len(self.data)
    # Returns the size of the dataset

Tensor

Tensor–Shape of Tensor

Check with.shape()

Note: dim in Pytorch == axis in NumPy

Tensor–Creating Tensors

Directly from data(list or numpy.ndarray)

x = torch.tensor([[1, -1], [-1, 1]])
x = torch.from_numpy(np.array([[1, -1], [-1, 1]]))

Output:
tensor([1., -1.],
        [-1, 1.])

Tensor of constant zeros & ones

x = torch.zeros([2, 2])
x = torch.ones([1, 2, 5])

Output:
tensor([[0.,0.],
        [0.,0.]])
tensor([[[1.,1.,1.,1.,1.],
         [1.,1.,1.,1.,1.]]])

Tensors–Comon Operations

Common arithmetic func are supported, such as:

Addition z = x + y
Subtraction z = x - y
Power y = x.pow(2)
Summation y = x.sum()

Mean y = x.mean()
Transpose: transpose 2 specified dimensions

>>> x = torch.zeros([2, 3])
>>> x.shape
torch.Size([2, 3])
>>> x = x.transpose(0, 1)
>>> x.shape
torch.Size([3, 2])

Squeeze: remove the specified dimension with length = 1

>>> x = torch.zeros([1, 2, 3])
>>> x.shape
torch.Size([1, 2, 3])
>>> x = x.squeeze(0)
# 消除第一纬度
>>> x.shape
torch.Size([2, 3])

Unsqueeze: expand a new dimension

>>> x = torch.zeros([2, 3])
>>> x.shape
torch.Size([2, 3])
>>> x = x.unsqueeze(1) # dim = 1
>>> x.shape
torch.Size([2, 1, 3])

Cat: concatenate multiple tensors

>>> x = torch.zeros([2, 1, 3])
>>> y = torch.zeros([2, 3, 3])
>>> z = torch.zeros([2, 2, 3])
>>> w = torch.Size([2, 6, 3])

Tensor–Data Type

Using different data types for model and data will cause errors

Data type	dtype	tensor
32-bit floating point	torch.float	torch.Floattensor
64-bit integer (signed)	torch.long	torch.LongTensor

Similar attributes & same func

PyTorch	NumPy
x.shape	x.shape
x.dtype	x.dtype
x.reshape/x.view	x.reshape
x.squeeze()	x.squeeze()
x.unsqueeze(1)	np.expand_dims(x,1)

Tensor–Device

Tensors & modules will be computed with CPU by default
Use.to() to move tensors to appropriate devices.
CPU
x = x.to(‘cpu’)
GPU
x = x.to(‘cuda’)

Tensors–Device(GPU)

Check if your computer has NVIDIA GPI
1
torch.cuda.is_available()
Multiple GPUs: specify ‘cuda:0’, ‘cuda:1’, ‘cuda:2’,…
Why use GPUs?
- Parallel computing with more cores for arithmetic calculations
- see What is a GPU and do you need one in Deep Learning?

Tensors–Gradient Calculation

>>> x = torch.tensor([[1. , 0.], [-1., 1.]], requires_grad = True)
>>> z = x.pow(2).sum()
>>> z.backward()        ## 非常重要，目标函数backward
>>> x.grad
tensor([[2., 0.],
        [-2., 2.]])

Step2–Build NN

torch.nn–Network Layers

Linear Layer(Fully-connected Layer)
nn.Linear(in_features, out_features)

>>> layer = torch.nn.Linear(32, 64)
>>> layer.weight.shape
torch.Size([64, 32])
>>> layer.bias.shape
torch.Size([64])

torch.nn – Non-Linear Activation Function

Sigmoid Activation
nn.Sigmoid
ReLU Activation
nn.ReLU

torch.nn – Build Neural Network

Step3–Loss Func & Optim

torch.nn–Loss Functions

Mean Squared Error(for regression tasks)
criterion = nn.MSELoss()
Cross Entropy(for classification tasks)
criterion = nn.CrossEntropyLoss()
loss = criterion(model_out, expected_value)

torch.optim

Gradient based optimization algorithms that adjust network
parameters to reduce error.
E.g. Stochastic Gradient Descent(SGD)
torch.optim.SGD(model.parameters(), lr, momentum = 0)

optimizer = torch.optim.SGD(model.parameters(), lr, momentum = 0)

For every batch of data:
1. Call optimizer.zero_grad() to reset gradients of model parameters.
2. Call loss.backward() to backpropagate gradients of prediction loss.
3. Call optimizer.step() to adjust model parameters.

Step4 Train&Test&Pred

Train

# Training setup
dataset = MyDataset(file)       # read data via MyDataset
tr_set = DataLoader(dataset, 16, shuffle=True) # put dataset into Dataloader
model = MyModel().to(device)    # construct model and move to device (cpu/cuda)
criterion = nn.MSELoss()        # set loss func
optimizer = torch.optim.SGD(model.parameters(), 0.1)    # set optimizer

# Training Loop
for epoch in range(n_epochs):   # iterate n_epochs
    model.tarin()               # set model to train mode
    for x, y in tr_set:         # iterate through the dataloader
    optimizer.zero_grad()       # set gradient to zero
    x, y = x.to(device), y.to(device)   # move data to device(cpu/cuda)
    pred = model(x)             # forward pass(compute output)
    loss = criterion(pred, y)   # compute loss
    loss.backward()             # computegradient(bp)
    optimizer.step()            # update model with optimizer

Test

model.eval()                    # set model to evaluation mode
total_loss = 0
for x, y in dv_set:             # iterate through the dataloader
    x, y = x.to(device), y.to(device)   # move data to device(cpu/cuda)
    with torch.no_grad():       # disable gradient calculation
        pred = model(x)         # forward pass(compute output)
        loss = criterion(pred, y)       # compute loss
    total_loss += loss.cpu().item() * len() # accumulate loss
    avg_loss = total+loss / len(dv_set.dataset) #compute averaged loss

Predict

model.eval()                    # set model to evaluation mode
preds = []:                     
for x in tt_set:                # iterate through the dataloader 
    x = x.to(device)            # move data to device
    with torch.no_grad():       # disable gradient calculation
        pred = model(x)         # forward pass
        preds.append(pred.cpu())    #collect

Notice

eval & no_grad

load & save

Tutorial2

load data/Preprocessing

Load data: you can use pandas to load a csv file.

1	train_data = pd.read_csv("./covid.train.csv").drop(columns=['date']).values

Preprocessing: Get model inputs and labels.

1	x_train, y_train = train_data[:,:-1], train_data[:,-1] # python的索引来说，右边】不包含

Dataset

init: Read data and preprocess
getitem: Return one sample at a time. In this case, one sample includes a 117 dimensional feature and a label
len: Return the size of the dataset. In this case, it is 2699

Dataloader

1	train_loader = DataLoader(train_dataset, ba)

Group data into batches
If you set shuffle = True, dataloader will permutes the indices of all samples automatically.
We often set shuffle = True during training
You can check this page Advantage to shuffle a dataset if u are curious about why

Model Building

class My_model(nn.Module):
    def __init__(self, input_dim):
        # TODO: modify model's structure, be aware of dimension.
        self.layers = nn.Sequential(
            nn.Linear(input_dim, 64),
            nn.ReLU(),
            nn.Linear(64, 32),
            nn.ReLU(),
            nn.Linear(32, 1)
        )

def forward(self, x):
    x = self.layers(x)
    x = x.squeeze(1) #(B, 1) -> B

Criterion

1 2	# 调用 Loss Func criterion = torch.nn.MSELoss(reduction = "mean")

Optimizer

1	optimizer = torch.optim.SGD(model.parameters(), lr = 1e-5, momentum=0.9)

Training loop

for epoch in range(3000):
    model.train()   # Set your model to train mode.
    # tqdm is a package to visualize your training progress.
    train_pbar = tqdm(train_loader, position=0, leave = True)
    for x, y in train_pbar:
        x, y = x.to('cuda'), y.to('cuda')   # Move your data to device.
        pred = model(x)
        loss = criterion(pred, y)
        loss.backward()                     # Compute gradient
        optimizer.step()                    # Update parameters.
        optimizer.zero_grad()               # Set gradient to 0