深度学习框架PyTorch的技巧总结

1.在训练模型时指定GPU的编号

设置当前使用的GPU设备仅为0号设备，设备名称为"/gpu:0"，os.environ["CUDA_VISIBLE_DEVICES"]="0";
设置当前使用的GPU设备为0，1两个设备，名称依次为"/gpu:0","/gpu:1"，os.environ["CUDA_VISIBLE_DEVICES"]="0,1";根据顺序优先表示使用0号设备，然后使用1号设备；
同样，也可以在训练脚本外面指定，CUDA_VISIBLE_DEVICES=0,1 python train.py,注意，如果此时使用的是8卡中的6和7，CUDA_VISIBLE_DEVICES=6,7 python train.py，但是在模型并行化的时候，仍然指定0和1，model=nn.DataParallel(mode, devices=[0,1];
在这里，需要注意的是，指定GPU的命令需要放在和网络模型操作的最前面；

2.查看模型每层的输如输出详情

1.需要安装torchsummary或者torchsummaryX(pip install torchsummary);
2.使用示例如下：

from torchvision import models

vgg16 = models.vgg16()
vgg16 = vgg16.cuda()

# 1.torchsummary使用方法
from torchsummary import summary
summary(vgg16, (3, 224, 224))    # (3, 224, 224)是网络模型的输入尺寸

# 2.torchsummaryX使用方法
from torchsummaryX import summary as summaryX

inputx = torch.randn(1, 3, 224, 224)
summaryX(vgg16, inputx)

输出的结果如下图所示(每层输出的shape以及模型的计算量)：
输出结果

3.梯度裁剪：防止在模型优化过程中出现梯度爆炸或者弥散

import torch
import torch.nn as nn

...
outputx = model(inputx)
optimizer.zero_grad()
loss.backward()
nn.utils.clip_grad_norm_(model.parameters(), max_norm=20, norm_type=2)
optimizer.step()

nn.utils.clip_grad_norm_的参数：

parameters:基于变量的迭代器，会进行梯度归一化；
max_norm:梯度的最大范数；
norm_type:规定范数的类型，默认为L2;
需要注意的是，梯度裁剪在某些任务上会额外消耗大量的计算时间。

4.扩张单张图片的维度

因为在模型训练的时候，输入数据的维度是(batch_size,c,h,w)，而在测试的时候是单张图片(c,h,w)，所以会需要进行维度扩张

import cv2
import torch
import numpy as np
    
####### 基于numpy的方法 #########
# 方法1.
image = cv2.imread(imgpath)
print(image.shape)
image = image[np.newaxis, :, :, :]
print(image.shape)   

####### 基于pytorch的方法 #########
# 方法2.
image = cv2.imread(imgpath)
image = torch.tensor(image)
print(image.shape)
image = image.view(1, *image.shape)
print(image.shape)

# 方法3.
image = cv2.imread(imgpath)
image = torch.tensor(image)
print(image.shape)
image = image.unsqueeze(dim=0)
print(image.shape)

tensor.unsqueeze(dim):扩展维度，dim指定扩展哪个维度；tensor.squeeze(dim):去除dim指定的且size为1的维度，当维度都大于1时，seqeeze()不起作用，不指定dim时，去除所有size为1的维度。

5.one-hot编码

在PyTorch里面的定义的交叉熵的时候，会自动把label转换成one-hot编码，所以不需要手动转换，而使用MSE需要手动转换成one-hot编码，以下是转换示例：

import torch
class_num = 8
batch_size = 4

def one_hot(label):
	"""
	Convert the label of one division to one-hot
	Argument:
		label: (type, tensor), the gt label, shape: (batch_size,)
	Return:
		one_hot_out: (type, tensor), the one-hot label, shape: (batch_size, class_num)
	"""
	label = label.resize_(batch_size, 1)
	m_zeros = torch.zeros(batch_size, class_num)
	one_hot_out = m_zeros.scatter_(1, label, 1)    # (dim, index, value)
	return one_hot_out

label = torch.LongTensor(batch_size).random_() % class_num
print(one_hot(label))

在PyTorch1.1之后，one_hot函数可以直接调用torch.nn.functional.one_hot

import torch
import torch.nn.functional as F

tensor = torch.arange(0, 5) % 3
one_hot = F.one_hot(tensor)

# F.one_hot会检测不同类别的个数，生成对应的one-hot，也可以自己定义类别数
one_hot = F.one_hot(tensor, num_classes=10)

6.在验证模型时，防止显存爆炸

在验证模型的过程中是不需要求导，既不需要梯度计算，关闭autograd，可以提高速度，节约内存，如果不关闭可能会爆显存：

with torch.no_grad():
	model.eval()

7.学习率的衰减策略

在模型的训练过程中动态地调整学习率，避免陷入局部优化点。

import torch
import torch.optim as optim
from torch.optim import lr_scheduler

# init optimier
optimizer = optim.Adam(model.parameters(), lr=0.001)
scheduler = lr_scheduler.StepLR(optimizer, 10, 0.1)     # 每隔10个epoch，学习率乘以0.1

# train process
for n in n_epoch:
	scheduler.step()
...

8.训练过程中冻结某些层的参数

当加载预训练模型的时候，或者在迁移学习中的分类模型，需要冻结前面几层，保证其features不动，使其在训练过程中不发生变化。

from torchvision import models

net = models.vgg16()
for name, value in net.named_parameters():
	print('name: {0}, \t grad: {1}'.format(name, value.requires_grad)
    
no_grad = ['cnn.VGG_16.convolution1_1.weight', 
            'cnn.VGG_16.convolution1_1.bias'
          ]
   
for name, value in net.named_parameters():
    if name in no_grad:
        value.requires_grad = False
    else:
        value.requires_grad = True
            
# 定义优化器
optimizer = optim.Adam(filter(lambda p: p.requires_grad, net.parameters()), lr=0.01)

9.训练过程中针对不同的层设置不同的学习率

根据模型在优化过程中，会根据需要，对不同的层，设置不同的的学习率，代码如下：

from torchvision import models

net = models.vgg16()
for name, value in net.named_parameters():
	print('name: {}'.format(name)
    
# split the layer according to the key words，
# feature layers:finetune，classifiery layers:from scratch
conv_params = []
fc_params = []
for name, params in net.named_parameters():
	if 'conv' in name:
    	conv_params += [params]
    else:
    	fc_params += [params]
        
# define the optimizer
optimizer = optim.Adam([
            	{
   
   'params': conv_params, 'lr': 1e-4}, 
                {
   
   'params': fc_params, 'lr': 1e-2}], weight_decay=1e-3)

将模型层划分为两部分，存放于一个列表中，每个部分就对应上面的一个字典，在字典里设置不同的学习率。当这两部分有相同的其他参数时，就将该参数放到列表外面作为全局参数，就像上面的’weight_decay’。也可以在列表外面设置一个全局学习率，当各个部分字典里设置了局部学习率时，就使用该学习率，否则就使用列表外面的全局学习率optimizer = optim.Adam([{'params': conv_params, 'lr': 1e-4}], lr=1e-2, weight_decay=1e-3)

10.模型的保存和加载方式

在模型的训练过程中需要对模型进行保存，使用模型的时候需要加载训练好的模型。Pytorch中保存和加载模型的主要分为两类：1. 保存加载整个模型；2. 只保存加载模型参数；

1.保存加载模型基本用法

保存加载整个模型(网络结构+模型的参数，比较耗时)

# save model
torch.save(model, 'net.pkl')

# load model 
model = torch.load('net.pkl')     # the model must have be defined

只保存加载模型参数(速度快，占内存少，推荐方法)

# save model parameters
torch.save(model.state_dict(), 'net_params.pkl'

# load model parameters， must build model firstly, load parameters secondly
model = Net()
state_dict = torch.load('net_params.pkl')
model.load_state_dict(state_dict)

2.保存加载自定义模型

上面保存的net.pkl文件其实是一个字典，通常包括以下内容： a.网络结构：输入尺寸，输出尺寸以及隐含层信息，以便能够在加载时重建模型； b.模型的权重参数：包括各个网络层训练后的可学习参数，可以在模型实例上调用state_dict()方法来获取，比如只保存模型权重参数时用到的model.state_dict(); c.优化器参数：有时候保存模型之后需要接着训练，那么就必须保存优化器的状态和所使用的超参数，也就是在优化器实例上调用state_dict()方法来获取这些参数； d.其他信息：有时候需要保存其他信息，比如epoch,batch_size等超参数。这样就可以自定义需要保存的内容，如下所示。

# saving a checkpoint assuming the network class named Net
checkpoint = {
   
   
    'model':Net(), 
    'model_state_dict':model.state_dict(), 
    'optimizer_state_dict':optimizer.state_dict(),
    'epoch':epoch
}

torch.save(chekpoint, 'checkpoint.pkl')

# load the model infor
def load_checkpoint(filepath):
    checkpoint = torch.load(filepath)
    model = checkpoint['model']     # 网络结构
    model.load_state_dict(checkpoint['model_state_dict'])    # 加载网络模型参数
    optimizer = optim.SGD()
    optimizer.load_state_dict(checkpoint['optimizer_state_dict'])    # 加载优化器参数

    for params in model.parameters():
        params.requires_grad = False
    
    model.eval()
    return model 

model = load_checkpoint('checkpoint.pkl')

加载模型是为了进行测试，则将每一层的requires_grad置为False，固定这些参数；还需要调用model.eval()将模型置为测试模式，主要是将Dropout和BatchNormalization进行固定，否则模型的预测结果每次都会不同。如果继续训练，则调用model.train()确保网络模型处于训练模式。

3.跨设备保存加载模型

在GPU上训练的模型，在CPU上加载(Save on GPU, Load on CPU):

device = torch.device('cpu')
model = Net()
# load all tensors onto the CPU device
model.load_state_dict(torch.load('net_params.pkl', map_location=device))
# <===> model.load_state_dict(torch.load('net_params.pkl', map_location='cpu'))

在GPU上训练的模型，在GPU上加载(Save on GPU, Load on GPU):

device = torch.device('cuda')
model = Net()
model.load_state_dict(torch.load('net_params.pkl'))
model.to(device)

在这里使用map_location参数不起作用，要使用model.to(torch.device("cuda"))将模型转换为CUDA优化的模型。

还需要对将输入模型的数据调用data=data.to(device)，即将数据从CPU转到GPU。注意，调用my_tensor.to(device)会返回一个my_tensor在GPU上的副本，它不会覆盖my_tensor。因此需要手动覆盖张量：my_tensor = my_tensor.to(device)

在CPU上训练的模型，在GPU上加载(Save on CPU, Load on GPU):

device = torch.device('cuda')
model = Net()
model.load_state_dict(torch.load('net_params.pkl', map_location='cuda:0'))
model.to(device)

11.GPU相关的几个函数

# 判断cuda时候可用
print(torch.cuda.is_available()

# 获取gpu数量
print(torch.cuda.device_count()

# 获取gpu名字
print(torch.cuda.get_device_name(0))

# 获取当前gpu设备索引，默认从0开始
print(torch.cuda.current_device())

# 将模型和数据从cpu移到gpu
use_cuda = torch.cuda.is_available()

# 方法1
if use_cuda:
    data = data.cuda()
    model.cuda()

# 方法2
device = torch.device('cuda' if use_cuda else 'cpu')
data = data.to(device)
model.to(device)

12.打印模型在inference中的特征图

包装模型(在forward中输出特征图);

import os
import cv2
import numpy as np
from PIL import Image

import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision.models as models


class FeatureVisualizaiton:
    input_size = 256
    def __init__(self, imgpath='', layers_idx=[1, 2], save_features_dir='/'):
        self.imgpath = imgpath
        self.layers_idx = layers_idx
        self.save_features_dir= save_features_dir
        self.net = models.vgg16()
    
    @staticmethod
    def preprocess_image(imgpath):
        assert os.path.isfile(imgpath), "The image of {%s} must be existed!" % imgpath
        img = cv2.imread(imgpath)
        # resize
        img = cv2.resize(img, (input_size, input_size))
        # normalize as [0, 1]
        img = (img / 255.).astype('float32').transpose((2, 0, 1))[np.newaxis, :, :, :]   # (1, 3, 256, 256)
        # <===>
        # img = (img / 255.).astype('float32').swapaxis(1, 2).swapaxis(0, 1)
        # img = np.expand_dims(img, axis=0)
        img = torch.from_numpy(img)
        return img
       
    def get_features(self):
        """Extract features"""
        features = {
   
   }
        inputx = self.preprocess_image(self.imgpath)
        print('inputx shape', inputx.shape)
        if torch.cuda.is_available():
            inputx = inputx.cuda()
            model = self.net.cuda()
            
        x = inputx 
        for index, (name, module) in enumerate(model.named_modules()):
            x = module(x)
            if index in self.layers_idx:
                features[name] = x
        return features
        
    def save_features(self):
        """Save features"""
        features = self.get_features()
        for name, feature in features.items():
            feature = self.process_feature(feature)
            cv2.imwrite(os.path.join(self.save_features_dir, name + '.jpg'), feature)
        
        
    @statcimethod
    def process_feature(feature):
        """
        Normalize the feature
        Arguments:
            feature: (type, tensor(b, c, h, w)), normalize to (0, 255) 
        """
        feature = feature.cpu().detach().numpy()
        
        # use sigmoid to [0, 1]
        feature = (1.0 / (1 + np.exp(-1 * feature))
        feature = np.round(feature * 255)
        return feature

if __name__ == '__main__':
    featurevisualization = FeatureVisualization()
    featurevisualization.save_features()

使用hook:利用pytorch里面的hook，可以不改变输入输出中间的网络结构，可以方便的获取，改变网络中间层的值和梯度(几种hook和forward，backward的先后关系在nn.module的__call__函数里面可以看得更清楚)，可以看到，对于register_forward_hook在forward的调用之后。

import os
import cv2
import numpy as np
from PIL import Image

import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision.models as models


class FeatureVisualizaiton:
    input_size = 256
    def __init__(self, imgpath='', layers_idx=[1, 2], save_features_dir='/'):
        self.imgpath = imgpath
        self.layers_idx = layers_idx
        self.save_features_dir= save_features_dir
        self.net = models.vgg16()
    
    @staticmethod
    def preprocess_image(imgpath):
        assert os.path.isfile(imgpath), "The image of {%s} must be existed!" % imgpath
        img = cv2.imread(imgpath)
        # resize
        img = cv2.resize(img, (input_size, input_size))
        # normalize as [0, 1]
        img = (img / 255.).astype('float32').transpose((2, 0, 1))[np.newaxis, :, :, :]   # (1, 3, 256, 256)
        # <===>
        # img = (img / 255.).astype('float32').swapaxis(1, 2).swapaxis(0, 1)
        # img = np.expand_dims(img, axis=0)
        img = torch.from_numpy(img)
        return img
       
    def get_features(self):
        """Extract features"""
        features = {
   
   }
        inputx = self.preprocess_image(self.imgpath)
        print('inputx shape', inputx.shape)
        if torch.cuda.is_available():
            inputx = inputx.cuda()
            model = self.net.cuda()
        
        # closure
        def get_activation(name):
            def hook(model, input, output):
                features[name] = output.detach()
            return hook
        
        # register hook
        for layer_idx in self.layers_idx:
            handle = model[layer_idx].register_forward_hook(get_activation(str(layer_idx))

        outputx = model(inputx)
        handle.remove()
        
        return features
        
    def save_features(self):
        """Save features"""
        features = self.get_features()
        for name, feature in features.items():
            feature = self.process_feature(feature)
            cv2.imwrite(os.path.join(self.save_features_dir, name + '.jpg'), feature)
        
    @statcimethod
    def process_feature(feature):
        """
        Normalize the feature
        Arguments:
            feature: (type, tensor(b, c, h, w)), normalize to (0, 255) 
        """
        feature = feature.cpu().detach().numpy()
        
        # use sigmoid to [0, 1]
        feature = (1.0 / (1 + np.exp(-1 * feature))
        feature = np.round(feature * 255)
        return feature

if __name__ == '__main__':
    featurevisualization = FeatureVisualization()
    featurevisualization.save_features()

13.Tensor类型之间的转换(三种方式)

使用独立函数：

import torch
import torch.nn as nn
    
x = torch.randn(3, 5)
print(x)
# convert x as long
x_long = x.long()
# convert x as half
x_half = x.half()
# convert x as int 
x_int = x.int()
# convert x as double
x_double = x.double()
# convert x as float
x_float = x.float()
# convert x as char
x_char = x.char()
# convert x as byte
x_byte = x.byte()
# convert x as short
x_short = x.short()

使用**torch.type()**函数：

import torch
import torch.nn as nn
    
x = torch.randn(3, 5)
x_int = x.type(torch.IntTensor)
print(x_int)

使用**type_as(ano_tensor)**将tensor转换为给定类型的tensor:

import torch
import torch.nn as nn
    
x = torch.FloatTensor(5)    
y = torch.IntTensor([10, 20])
    
x_int = x.type_as(y)
assert isinstance(x_int, torch.IntTensor)