PyTorch_Tutorial (TA Yuan Tseng)
Carpe Tu Black Whistle

Videos Link

PyTorch_Tutorial1 https://www.youtube.com/watch?v=85uJ9hSaXig
PyTorch_Tutorial2 https://www.youtube.com/watch?v=VbqNn20FoHM

Outline

  • Background: Prerequisites & What is Pytorch?
  • Training & Testing Neural Networks in Pytorch
  • Dataset & Dataloader
  • Tensors
  • torch.nn; Models, Loss Functions
  • torch.optim: Optimization
  • Save/load models

Tutorial1

Prerequisites

  • We assume u r already familiar with…
  1. Python3
    if-else, loop, function, file IO, class,
  2. Deep Learning Basics
    Prof. Lee’s 1st & 2nd lecture videos.

Some knowledge of NumPy will also be useful!

PyTorch

  • An machine learningframework in Python
  • Two main features:
    • N-dimensional Tensor computation(like NumPy) on GPUS
    • Automatic differentiation for training deep neural networks

Training Neural Networks

image

Step1–Load dataset

image

  • Dataset: stores data samples and expected values
  • Dataloader: groups data in batches, enables multiprocessing
1
2
dataset = MyDataset(file)
dataloader = DataLoader(**dataset**, batch_size, shuffle=True)

image

Dataset & Dataloader

1
2
3
4
5
6
7
8
9
10
11
12
from torch.utils.data import Dataset, DataLoader

class MyDataset(Dataset):
def __init__(self, file):
self.data = ...
# Read data & preprocess
def __getitem__(self, index):
return self.data[index]
# Returns one sample at a time
def __len__(self):
return len(self.data)
# Returns the size of the dataset

Tensor

Tensor–Shape of Tensor

  • Check with.shape()

image

Note: dim in Pytorch == axis in NumPy

Tensor–Creating Tensors

  • Directly from data(list or numpy.ndarray)
    1
    2
    3
    4
    5
    6
    x = torch.tensor([[1, -1], [-1, 1]])
    x = torch.from_numpy(np.array([[1, -1], [-1, 1]]))

    Output:
    tensor([1., -1.],
    [-1, 1.])
  • Tensor of constant zeros & ones
    1
    2
    3
    4
    5
    6
    7
    8
    x = torch.zeros([2, 2])
    x = torch.ones([1, 2, 5])

    Output:
    tensor([[0.,0.],
    [0.,0.]])
    tensor([[[1.,1.,1.,1.,1.],
    [1.,1.,1.,1.,1.]]])

Tensors–Comon Operations

Common arithmetic func are supported, such as:

  • Addition z = x + y
  • Subtraction z = x - y
  • Power y = x.pow(2)
  • Summation y = x.sum()
  • Mean y = x.mean()
    Transpose: transpose 2 specified dimensions
    1
    2
    3
    4
    5
    6
    >>> x = torch.zeros([2, 3])
    >>> x.shape
    torch.Size([2, 3])
    >>> x = x.transpose(0, 1)
    >>> x.shape
    torch.Size([3, 2])
    Squeeze: remove the specified dimension with length = 1
    1
    2
    3
    4
    5
    6
    7
    >>> x = torch.zeros([1, 2, 3])
    >>> x.shape
    torch.Size([1, 2, 3])
    >>> x = x.squeeze(0)
    # 消除第一纬度
    >>> x.shape
    torch.Size([2, 3])
    Unsqueeze: expand a new dimension
    1
    2
    3
    4
    5
    6
    >>> x = torch.zeros([2, 3])
    >>> x.shape
    torch.Size([2, 3])
    >>> x = x.unsqueeze(1) # dim = 1
    >>> x.shape
    torch.Size([2, 1, 3])

Cat: concatenate multiple tensors

1
2
3
4
>>> x = torch.zeros([2, 1, 3])
>>> y = torch.zeros([2, 3, 3])
>>> z = torch.zeros([2, 2, 3])
>>> w = torch.Size([2, 6, 3])

image

Tensor–Data Type

  • Using different data types for model and data will cause errors
Data typedtypetensor
32-bit floating pointtorch.floattorch.Floattensor
64-bit integer (signed)torch.longtorch.LongTensor
  • Similar attributes & same func
PyTorchNumPy
x.shapex.shape
x.dtypex.dtype
x.reshape/x.viewx.reshape
x.squeeze()x.squeeze()
x.unsqueeze(1)np.expand_dims(x,1)

Tensor–Device

  • Tensors & modules will be computed with CPU by default

    Use.to() to move tensors to appropriate devices.

  • CPU
    x = x.to(‘cpu’)

  • GPU
    x = x.to(‘cuda’)

Tensors–Device(GPU)

  • Check if your computer has NVIDIA GPI

    1
    torch.cuda.is_available()
  • Multiple GPUs: specify ‘cuda:0’, ‘cuda:1’, ‘cuda:2’,…

  • Why use GPUs?

Tensors–Gradient Calculation

1
2
3
4
5
6
>>> x = torch.tensor([[1. , 0.], [-1., 1.]], requires_grad = True)
>>> z = x.pow(2).sum()
>>> z.backward() ## 非常重要,目标函数backward
>>> x.grad
tensor([[2., 0.],
[-2., 2.]])

image

Step2–Build NN

image

torch.nn–Network Layers

  • Linear Layer(Fully-connected Layer)
    nn.Linear(in_features, out_features)

image

image

1
2
3
4
5
>>> layer = torch.nn.Linear(32, 64)
>>> layer.weight.shape
torch.Size([64, 32])
>>> layer.bias.shape
torch.Size([64])

torch.nn – Non-Linear Activation Function

  • Sigmoid Activation
    nn.Sigmoid
  • ReLU Activation
    nn.ReLU

torch.nn – Build Neural Network

image

Step3–Loss Func & Optim

torch.nn–Loss Functions

  • Mean Squared Error(for regression tasks)
    criterion = nn.MSELoss()
  • Cross Entropy(for classification tasks)
    criterion = nn.CrossEntropyLoss()
  • loss = criterion(model_out, expected_value)

torch.optim

  • Gradient based optimization algorithms that adjust network
    parameters to reduce error.

  • E.g. Stochastic Gradient Descent(SGD)
    torch.optim.SGD(model.parameters(), lr, momentum = 0)

optimizer = torch.optim.SGD(model.parameters(), lr, momentum = 0)

  • For every batch of data:
    1. Call optimizer.zero_grad() to reset gradients of model parameters.
    2. Call loss.backward() to backpropagate gradients of prediction loss.
    3. Call optimizer.step() to adjust model parameters.

Step4 Train&Test&Pred

Train

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# Training setup
dataset = MyDataset(file) # read data via MyDataset
tr_set = DataLoader(dataset, 16, shuffle=True) # put dataset into Dataloader
model = MyModel().to(device) # construct model and move to device (cpu/cuda)
criterion = nn.MSELoss() # set loss func
optimizer = torch.optim.SGD(model.parameters(), 0.1) # set optimizer

# Training Loop
for epoch in range(n_epochs): # iterate n_epochs
model.tarin() # set model to train mode
for x, y in tr_set: # iterate through the dataloader
optimizer.zero_grad() # set gradient to zero
x, y = x.to(device), y.to(device) # move data to device(cpu/cuda)
pred = model(x) # forward pass(compute output)
loss = criterion(pred, y) # compute loss
loss.backward() # computegradient(bp)
optimizer.step() # update model with optimizer

Test

1
2
3
4
5
6
7
8
9
model.eval()                    # set model to evaluation mode
total_loss = 0
for x, y in dv_set: # iterate through the dataloader
x, y = x.to(device), y.to(device) # move data to device(cpu/cuda)
with torch.no_grad(): # disable gradient calculation
pred = model(x) # forward pass(compute output)
loss = criterion(pred, y) # compute loss
total_loss += loss.cpu().item() * len() # accumulate loss
avg_loss = total+loss / len(dv_set.dataset) #compute averaged loss

Predict

1
2
3
4
5
6
7
model.eval()                    # set model to evaluation mode
preds = []:
for x in tt_set: # iterate through the dataloader
x = x.to(device) # move data to device
with torch.no_grad(): # disable gradient calculation
pred = model(x) # forward pass
preds.append(pred.cpu()) #collect

Notice

eval & no_grad

image

load & save

image

Tutorial2

load data/Preprocessing

Load data: you can use pandas to load a csv file.

1
train_data = pd.read_csv("./covid.train.csv").drop(columns=['date']).values

Preprocessing: Get model inputs and labels.

1
x_train, y_train = train_data[:,:-1], train_data[:,-1]  # python的索引来说,右边】不包含

Dataset

  • init: Read data and preprocess
  • getitem: Return one sample at a time. In this case, one sample includes a 117 dimensional feature and a label
  • len: Return the size of the dataset. In this case, it is 2699

Dataloader

1
train_loader = DataLoader(train_dataset, ba)
  • Group data into batches
  • If you set shuffle = True, dataloader will permutes the indices of all samples automatically.
  • We often set shuffle = True during training
  • You can check this page Advantage to shuffle a dataset if u are curious about why

Model Building

1
2
3
4
5
6
7
8
9
10
11
12
13
14
class My_model(nn.Module):
def __init__(self, input_dim):
# TODO: modify model's structure, be aware of dimension.
self.layers = nn.Sequential(
nn.Linear(input_dim, 64),
nn.ReLU(),
nn.Linear(64, 32),
nn.ReLU(),
nn.Linear(32, 1)
)

def forward(self, x):
x = self.layers(x)
x = x.squeeze(1) #(B, 1) -> B

Criterion

1
2
# 调用 Loss Func
criterion = torch.nn.MSELoss(reduction = "mean")

Optimizer

1
optimizer = torch.optim.SGD(model.parameters(), lr = 1e-5, momentum=0.9)

Training loop

1
2
3
4
5
6
7
8
9
10
11
for epoch in range(3000):
model.train() # Set your model to train mode.
# tqdm is a package to visualize your training progress.
train_pbar = tqdm(train_loader, position=0, leave = True)
for x, y in train_pbar:
x, y = x.to('cuda'), y.to('cuda') # Move your data to device.
pred = model(x)
loss = criterion(pred, y)
loss.backward() # Compute gradient
optimizer.step() # Update parameters.
optimizer.zero_grad() # Set gradient to 0