Deep view on Transfer learning with Iamge classification Pytorch


Transfer learning

  1. Convolutional base
  2. Classifier
  1. ConvNet as a fixed feature extractor/train as classifier
  2. Finetuning the ConvNet/fine tune
  3. Pretrained models

Transfer Learning:

Transfer Learning is mostly used in Computer Vision(tutorial) , Image classification(tutorial) and Natural Language Processing(tutorial) Tasks like Sentiment Analysis, because of the huge amount of computational power that is needed for them.

Pre-trained model:

Pre-trained models(VGG, InceptionV3, Mobilenet)are extremely useful when they are suitable for the task at hand, but they are often not optimized for the specific dataset users are tackling. As an example, InceptionV3 is a model optimized for image classification on a broad set of 1000 categories, but our domain might be dog breed classification. A commonly used technique in deep learning is transfer learning, which adapts a model trained for a similar task to the task at hand. Compared with training a new model from ground-up, transfer learning requires substantially less data and resources.

CONVolutional Nueral NETworks(CONVNET/CNN)

Typical CNN:

A Typical CNN consists of 2 important parts(look above figure):

  1. Convolutional base/Feature learning(Conv+Relu+Pooling)
  2. Classifier/Classification(Fully connected layer)
  3. Convolutional base , which is composed by a stack of convolutional and pooling layers. The main goal of the convolutional base is to generate features from the image such as Edges/lines/curves in earlier layers and shapes in middle layers.
  4. Classifier, which is usually composed by a fully connected layers. Classifier classifies the image based on the specific task related Features.

Transfer learning scenarios:

Transfer learning can be used in 3 ways:

  1. ConvNet as a fixed feature extractor/train as classifier
  2. Finetuning the ConvNet/fine tune
  3. Pretrained models
  4. ConvNet as a fixed feature extractor:
VGG16 on Imagenet
Source: keras blog and anuj shah , Entire vgg net was as same, but the last fully connected(FC) layer was replaced with our choice(SVM/Nueral network here in the figure)
source:keras blog and anuj shah
  1. New dataset is small and similar to original dataset. Since the data is small, it is not a good idea to fine-tune the ConvNet due to overfitting concerns. Since the data is similar to the original data, we expect higher-level features in the ConvNet to be relevant to this dataset as well. Hence, the best idea might be to train a linear classifier on the CNN codes.
  2. New dataset is large and similar to the original dataset. Since we have more data, we can have more confidence that we won’t overfit if we were to try to fine-tune through the full network.
  3. New dataset is small but very different from the original dataset. Since the data is small, it is likely best to only train a linear classifier. Since the dataset is very different, it might not be best to train the classifier form the top of the network, which contains more dataset-specific features. Instead, it might work better to train the SVM classifier from activations somewhere earlier in the network.
  4. New dataset is large and very different from the original dataset. Since the dataset is very large, we may expect that we can afford to train a ConvNet from scratch. However, in practice it is very often still beneficial to initialize with weights from a pretrained model. In this case, we would have enough data and confidence to fine-tune through the entire network.

Transfer learning using pytorch for image classification:

In this tutorial, you will learn how to train your network using transfer learning. I recommend to use google colab for fast computing and speeding up processing.

  • Finetuning the convnet: Instead of random initializaion, we initialize the network with a pretrained network, like the one that is trained on imagenet 1000 dataset. Rest of the training looks as usual.
  • ConvNet as fixed feature extractor: Here, we will freeze the weights for all of the network except that of the final fully connected layer. This last fully connected layer is replaced with a new one with random weights and only this layer is trained.


from __future__ import print_function, division

import torch
import torch.nn as nn
import torch.optim as optim
from torch.optim import lr_scheduler
import numpy as np
import torchvision
from torchvision import datasets, models, transforms
import matplotlib.pyplot as plt
import time
import os
import copy

plt.ion() # interactive mode

Data loading:

The training archive contains 25,000 images of dogs and cats. you can download and know more about the data here.

# Data augmentation and normalization for training
# Just normalization for validation
data_transforms = {
'train': transforms.Compose([
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
'val': transforms.Compose([
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
data_dir = '/content/Cat_Dog_data/'
image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x),
for x in ['train','val']}
dataloaders = {x:[x], batch_size=4,
shuffle=True, num_workers=4)
for x in ['train','val']}
dataset_sizes = {x: len(image_datasets[x]) for x in ['train','val']}
class_names = image_datasets['train'].classes
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

Visualize few images:

def imshow(inp, title=None):
"""Imshow for Tensor."""
inp = inp.numpy().transpose((1, 2, 0))
mean = np.array([0.485, 0.456, 0.406])
std = np.array([0.229, 0.224, 0.225])
inp = std * inp + mean
inp = np.clip(inp, 0, 1)
if title is not None:
plt.pause(0.001) # pause a bit so that plots are updated
# Get a batch of training data
inputs, classes = next(iter(dataloaders['train']))
# Make a grid from batch
out = torchvision.utils.make_grid(inputs)
imshow(out, title=[class_names[x] for x in classes])

Training the model

Now, let’s write a general function to train a model. Here, we will illustrate:

  • Scheduling the learning rate
  • Saving the best model
def train_model(model, criterion, optimizer, scheduler, num_epochs=25):
since = time.time()

best_model_wts = copy.deepcopy(model.state_dict())
best_acc = 0.0

for epoch in range(num_epochs):
print('Epoch {}/{}'.format(epoch, num_epochs - 1))
print('-' * 10)

# Each epoch has a training and validation phase
for phase in ['train', 'val']:
if phase == 'train':
model.train() # Set model to training mode
model.eval() # Set model to evaluate mode

running_loss = 0.0
running_corrects = 0

# Iterate over data.
for inputs, labels in dataloaders[phase]:
inputs =
labels =

# zero the parameter gradients

# forward
# track history if only in train
with torch.set_grad_enabled(phase == 'train'):
outputs = model(inputs)
_, preds = torch.max(outputs, 1)
loss = criterion(outputs, labels)

# backward + optimize only if in training phase
if phase == 'train':

# statistics
running_loss += loss.item() * inputs.size(0)
running_corrects += torch.sum(preds ==

epoch_loss = running_loss / dataset_sizes[phase]
epoch_acc = running_corrects.double() / dataset_sizes[phase]

print('{} Loss: {:.4f} Acc: {:.4f}'.format(
phase, epoch_loss, epoch_acc))

# deep copy the model
if phase == 'val' and epoch_acc > best_acc:
best_acc = epoch_acc
best_model_wts = copy.deepcopy(model.state_dict())


time_elapsed = time.time() - since
print('Training complete in {:.0f}m {:.0f}s'.format(
time_elapsed // 60, time_elapsed % 60))
print('Best val Acc: {:4f}'.format(best_acc))

# load best model weights
return model

Visualizing the model predictions:

Generic function to display predictions for a few images

def visualize_model(model, num_images=6):
was_training =
images_so_far = 0
fig = plt.figure()

with torch.no_grad():
for i, (inputs, labels) in enumerate(dataloaders['val']):
inputs =
labels =

outputs = model(inputs)
_, preds = torch.max(outputs, 1)

for j in range(inputs.size()[0]):
images_so_far += 1
ax = plt.subplot(num_images//2, 2, images_so_far)
ax.set_title('predicted: {}'.format(class_names[preds[j]]))

if images_so_far == num_images:

Finetuning the convnet:

Load a pretrained model and reset final fully connected layer.

model_ft = models.resnet18(pretrained=True)
num_ftrs = model_ft.fc.in_features
model_ft.fc = nn.Linear(num_ftrs, 2)

model_ft =

criterion = nn.CrossEntropyLoss()

# Observe that all parameters are being optimized
optimizer_ft = optim.SGD(model_ft.parameters(), lr=0.001, momentum=0.9)

# Decay LR by a factor of 0.1 every 7 epochs
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_ft, step_size=7, gamma=0.1)

Train and evaluate

It should take around 45–60 min on CPU. On GPU though, it takes less than a hour as we are working on dataset of huge size of 25000 images.

model_ft = train_model(model_ft, criterion, optimizer_ft, exp_lr_scheduler,num_epochs=25)
Epoch 24/24
train Loss: 0.0822 Acc: 0.9650
val Loss: 0.0315 Acc: 0.9876

Training complete in 133m 50s
Best val Acc: 0.988400

Visualise the output

output predicted

Similar tutorial

You can find similar tutorial on Ants and Bees here in the official pytorch website here.

Andrew Ng on transfer learning at NIPS 2016
Demis Hassabis, CEO,Deep mind


Transfer learning

  1. Convolutional base
  2. Classifier
  1. ConvNet as a fixed feature extractor/train as classifier
  2. Finetuning the ConvNet/fine tune
  3. Pretrained models

Thank you

You can comment your views. will be updated if any mistakes found. please feel free to comment down and as usual you can contact me linkedin below.

References/some other great tutorials:

Sebastian Ruder blog.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store