问题
It's going to be a long post, sorry in advance...
I'm working on a denoising algorithm and my goal is to:
- Use PyTorch to design / train the model
- Convert the PyTorch model into a CoreML model
The denoising algorithm consists in the following 3 parts:
- A "down-sampling" + noise level map
- A regular convnet
- An "up-sampling"
The first part is quite simple in its idea, but not so easy to explain. Given for instance an input color image and a input value "sigma" that represents the standard deviation of the image noise. The "down-sampling" part is in fact a space-to-depth. In short, for a given channel and for a subset of 2x2 pixels, the space-to-depth creates a single pixel composed of 4 channels. The number of channels is multiplied by 4 while the height and width are divided by 2. The data is simply reorganized. The noise level map consists in creating 3 channels containing the standard deviation value so that the convnet knows how to properly denoise the input image. This will be maybe more clear with some code:
def downsample_and_noise_map(input, sigma):
# Input tensor size (batch, channels, height, width)
in_n, in_c, in_h, in_w = input.size()
# Output tensor size
out_h = in_h // 2
out_w = in_w // 2
sigma_c = in_c # nb of channels of the standard deviation tensor
image_c = in_c * 4 # nb of channels of the image tensor
# Standard deviation tensor
output_sigma = sigma.view(1, 1, 1, 1).repeat(in_n, sigma_c, out_h, out_w)
# Image tensor
output_image = torch.zeros((in_n, image_c, out_h, out_w))
output_image[:, 0::4, :, :] = input[:, :, 0::2, 0::2]
output_image[:, 1::4, :, :] = input[:, :, 0::2, 1::2]
output_image[:, 2::4, :, :] = input[:, :, 1::2, 0::2]
output_image[:, 3::4, :, :] = input[:, :, 1::2, 1::2]
# Concatenate standard deviation and image tensors
return torch.cat((output_sigma, output_image), dim=1)
This function is then called as the first step in the model's forward
function:
def forward(self, x, sigma):
x = downsample_and_noise_map(x, sigma)
x = self.convnet(x)
x = upsample(x)
return x
Let's consider an input tensor of size 1x3x100x100 (PyTorch standard: batch, channels, height, width) and a sigma value of 0.1. The output tensor has the following properties:
- Tensor's shape is 1x15x50x50
- Tensor's values for channels 0, 1 and 2 are all equal to sigma = 0.1
- Tensor's values for channels 3, 4, 5, 6 are composed of the input image values of channel 0
- Tensor's values for channels 7, 8, 9, 10 are composed of the input image values of channel 1
- Tensor's values for channels 11, 12, 13, 14 are composed of the input image values of channel 2
If this code is not clear enough, I can post an even more naive version.
The up-sampling part is the reciprocal function of the downsampling one.
I was able to use this function for training and testing in PyTorch.
Then, I tried to convert the model to CoreML with ONNX as an intermediate step. The conversion to ONNX generated "TracerWarning". Conversion from ONNX to CoreML failed (TypeError: 1.0 has type numpy.float64, but expected one of: int, long). The problem came from the down-sampling + noise level map (and from up-sampling too).
When I removed the down-sampling + noise level map and up-sampling layers, I was able to convert to ONNX and to CoreML very easily since only a simple convnet remained. This means I have a solution to my problem: implement these 2 layers using 2 shaders on the mobile side. But I'm not satisfied with this solution as I want my model to contain all layers ^^
Before considering writing a post here, I crawled Internet to find an answer and I was able to write a better version of the previous function using reshape
and permute
. This version removed all ONNX warning, but the CoreML conversion still failed...
def downsample_and_noise_map(input, sigma):
# Input image size
in_n, in_c, in_h, in_w = input.size()
# Output tensor size
out_n = in_n
out_h = in_h // 2
out_w = in_w // 2
# Create standard deviation tensor
output_sigma = sigma.view(out_n, 1, 1, 1).repeat(out_n, in_c, out_h, out_w)
# Split RGB channels
channels_rgb = torch.split(input, 1, dim=1)
# Reshape (space-to-depth) each image channel
channels_reshaped = []
for channel in channels_rgb:
channel = channel.reshape(1, out_h, 2, out_w, 2)
channel = channel.permute(2, 4, 0, 1, 3)
channel = channel.reshape(1, 4, out_h, out_w)
channels_reshaped.append(channel)
# Concatenate all reshaped image channels together
output_image = torch.cat(channels_reshaped, dim=1)
# Concatenate standard deviation and image tensors
output = torch.cat([output_sigma, output_image], dim=1)
return output
So here are (some of) my questions:
- What is the preferred PyTorch way to implement a function such as
downsample_and_noise_map
function within a model? - Same question but when the conversion to ONNX and then to CoreML is part of the equation?
- Is the PyTorch -> ONNX -> CoreML still best path to deploy the model for iOS production?
Thanks for your help (and your patience) ^^
回答1:
Disclaimer I'm not familiar with CoreML or deploying to iOS but I do have experience deploying PyTorch models in TensorRT and OpenVINO via ONNX.
The main issues I've faced when deploying to other frameworks is that operations like slicing and repeating tensors tend to have limited support in other frameworks. Often we can construct equivalent conv or transpose-conv operations which achieve the desired behavior.
In order to ensure we don't export the logic used to construct the conv weights I've separated the weight initialization from the application of the weights. This makes the ONNX export much more straightforward since all it sees is some constant tensors being applied.
class DownsampleAndNoiseMap():
def __init__(self):
self.initialized = False
self.weight = None
self.zeros = None
def init_weights(self, input):
with torch.no_grad():
in_n, in_c, in_h, in_w = input.size()
out_h = int(in_h // 2)
out_w = int(in_w // 2)
sigma_c = in_c
image_c = in_c * 4
# conv weights used for downsampling
self.weight = torch.zeros(image_c, in_c, 2, 2).to(input)
for c in range(in_c):
self.weight[4 * c, c, 0, 0] = 1
self.weight[4 * c + 1, c, 0, 1] = 1
self.weight[4 * c + 2, c, 1, 0] = 1
self.weight[4 * c + 3, c, 1, 1] = 1
# zeros used to replace repeat
self.zeros = torch.zeros(in_n, sigma_c, out_h, out_w).to(input)
self.initialized = True
def __call__(self, input, sigma):
assert self.initialized
output_sigma = self.zeros + sigma
output_image = torch.nn.functional.conv2d(input, self.weight, stride=2)
return torch.cat((output_sigma, output_image), dim=1)
class Upsample():
def __init__(self):
self.initialized = False
self.weight = None
def init_weights(self, input):
with torch.no_grad():
in_n, in_c, in_h, in_w = input.size()
image_c = in_c * 4
self.weight = torch.zeros(in_c + image_c, in_c, 2, 2).to(input)
for c in range(in_c):
self.weight[in_c + 4 * c, c, 0, 0] = 1
self.weight[in_c + 4 * c + 1, c, 0, 1] = 1
self.weight[in_c + 4 * c + 2, c, 1, 0] = 1
self.weight[in_c + 4 * c + 3, c, 1, 1] = 1
self.initialized = True
def __call__(self, input):
assert self.initialized
return torch.nn.functional.conv_transpose2d(input, self.weight, stride=2)
I made the assumption that upsample was the reciprocal of downsample in the sense that x == upsample(downsample_and_noise_map(x, sigma))
(correct me if I'm wrong in this assumption). I also verified that my version of downsample agrees with yours.
# consistency checking code
x = torch.randn(1, 3, 100, 100)
sigma = torch.randn(1)
# OP downsampling
y1 = downsample_and_noise_map(x, sigma)
ds = DownsampleAndNoiseMap()
ds.init_weights(x)
y2 = ds(x, sigma)
print('downsample diff:', torch.sum(torch.abs(y1 - y2)).item())
us = Upsample()
us.init_weights(x)
x_recov = us(ds(x, sigma))
print('recovery error:', torch.sum(torch.abs(x - x_recov)).item())
which results in
downsample diff: 0.0
recovery error: 0.0
Exporting to ONNX
When exporting we need to invoke init_weights
for the new classes before using torch.onnx.export
. For example
class Model(torch.nn.Module):
def __init__(self):
super().__init__()
self.downsample = DownsampleAndNoiseMap()
self.upsample = Upsample()
self.convnet = lambda x: x # placeholder
def init_weights(self, x):
self.downsample.init_weights(x)
self.upsample.init_weights(x)
def forward(self, x, sigma):
x = self.downsample(x, sigma)
x = self.convnet(x)
x = self.upsample(x)
return x
x = torch.randn(1, 3, 100, 100)
sigma = torch.randn(1)
model = Model()
# ... load state dict here
model.init_weights(x)
torch.onnx.export(model, (x, sigma), 'deploy.onnx', verbose=True, input_names=["input", "sigma"], output_names=["output"])
which gives the ONNX graph
graph(%input : Float(1, 3, 100, 100)
%sigma : Float(1)) {
%2 : Float(1, 3, 50, 50) = onnx::Constant[value=<Tensor>](), scope: Model
%3 : Float(1, 3, 50, 50) = onnx::Add(%2, %sigma), scope: Model
%4 : Float(12, 3, 2, 2) = onnx::Constant[value=<Tensor>](), scope: Model
%5 : Float(1, 12, 50, 50) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[2, 2], pads=[0, 0, 0, 0], strides=[2, 2]](%input, %4), scope: Model
%6 : Float(1, 15, 50, 50) = onnx::Concat[axis=1](%3, %5), scope: Model
%7 : Float(15, 3, 2, 2) = onnx::Constant[value=<Tensor>](), scope: Model
%output : Float(1, 3, 100, 100) = onnx::ConvTranspose[dilations=[1, 1], group=1, kernel_shape=[2, 2], pads=[0, 0, 0, 0], strides=[2, 2]](%6, %7), scope: Model
return (%output);
}
As for the last question about the recommended way to deploy on iOS I can't answer that since I don't have experience in that area.
来源:https://stackoverflow.com/questions/59177052/how-to-properly-implement-data-reorganization-using-pytorch