问题
I trained a network using the provided ImageReader and now, I'm trying to use the CNTK EvalDll in a C# project to evaluate RGB Images.
I've seen examples related to the EvalDll, but the input is always an array of float/double, never images.
How can I use the exposed interface to use the trained network with an RGB image ?
回答1:
I'll assume that you'll want the equivalent of reading with the ImageReader
, where your reader config looks something like
features=[
width=224
height=224
channels=3
cropType=Center
]
You'll need helper functions to create the crop, and to re-size the image to the size accepted by the network.
I'll define 2 extension methods of System.Drawing.Bitmap
, one to crop, and one to re-size:
open System.Collections.Generic
open System.Drawing
open System.Drawing.Drawing2D
open System.Drawing.Imaging
type Bitmap with
/// Crops the image in the present object, starting at the given (column, row), and retaining
/// the given number of columns and rows.
member this.Crop(column, row, numCols, numRows) =
let rect = Rectangle(column, row, numCols, numRows)
this.Clone(rect, this.PixelFormat)
/// Creates a resized version of the present image. The returned image
/// will have the given width and height. This may distort the aspect ratio
/// of the image.
member this.ResizeImage(width, height, useHighQuality) =
// Rather than using image.GetThumbnailImage, use direct image resizing.
// GetThumbnailImage throws odd out-of-memory exceptions on some
// images, see also
// http://stackoverflow.com/questions/27528057/c-sharp-out-of-memory-exception-in-getthumbnailimage-on-a-server
// Use the interpolation method suggested on
// http://stackoverflow.com/questions/1922040/resize-an-image-c-sharp
let rect = Rectangle(0, 0, width, height);
let destImage = new Bitmap(width, height);
destImage.SetResolution(this.HorizontalResolution, this.VerticalResolution);
use graphics = Graphics.FromImage destImage
graphics.CompositingMode <- CompositingMode.SourceCopy;
if useHighQuality then
graphics.InterpolationMode <- InterpolationMode.HighQualityBicubic
graphics.CompositingQuality <- CompositingQuality.HighQuality
graphics.SmoothingMode <- SmoothingMode.HighQuality
graphics.PixelOffsetMode <- PixelOffsetMode.HighQuality
else
graphics.InterpolationMode <- InterpolationMode.Low
use wrapMode = new ImageAttributes()
wrapMode.SetWrapMode WrapMode.TileFlipXY
graphics.DrawImage(this, rect, 0, 0, this.Width,this.Height, GraphicsUnit.Pixel, wrapMode)
destImage
Based on that, define a function to do the center crop:
/// Returns a square sub-image from the center of the given image, with
/// a size that is cropRatio times the smallest image dimension. The
/// aspect ratio is preserved.
let CenterCrop cropRatio (image: Bitmap) =
let cropSize =
float(min image.Height image.Width) * cropRatio
|> int
let startRow = (image.Height - cropSize) / 2
let startCol = (image.Width - cropSize) / 2
image.Crop(startCol, startRow, cropSize, cropSize)
Then plug it all together: crop, resize, then traverse the image in the plane order that OpenCV uses:
/// Creates a list of CNTK feature values from a given bitmap.
/// The image is first resized to fit into an (targetSize x targetSize) bounding box,
/// then the image planes are converted to a CNTK tensor.
/// Returns a list with targetSize*targetSize*3 values.
let ImageToFeatures (image: Bitmap, targetSize) =
// Apply the same image pre-processing that is typically done
// in CNTK when running it in test or write mode: Take a center
// crop of the image, then re-size it to the network input size.
let cropped = CenterCrop 1.0 image
let resized = cropped.ResizeImage(targetSize, targetSize, false)
// Ensure that the initial capacity of the list is provided
// with the constructor. Creating the list via the default constructor
// makes the whole operation 20% slower.
let features = List (targetSize * targetSize * 3)
// Traverse the image in the format that is used in OpenCV:
// First the B plane, then the G plane, R plane
for c in 0 .. 2 do
for h in 0 .. (resized.Height - 1) do
for w in 0 .. (resized.Width - 1) do
let pixel = resized.GetPixel(w, h)
let v =
match c with
| 0 -> pixel.B
| 1 -> pixel.G
| 2 -> pixel.R
| _ -> failwith "No such channel"
|> float32
features.Add v
features
Call ImageToFeatures
with the image in question, feed the result into an instance of IEvaluateModelManagedF
, and you're good. I'm assuming your RGB image comes in myImage
, and you're doing binary classification with a network size of 224 x 224.
let LoadModelOnCpu modelPath =
let model = new IEvaluateModelManagedF()
let description = sprintf "deviceId=-1\r\nmodelPath=\"%s\"" modelPath
model.Init description
model.CreateNetwork description
model
let model = LoadModelOnCpu("myModelFile")
let featureDict = Dictionary()
featureDict.["features"] <- ImageToFeatures(myImage, 224)
model.Evaluate(featureDict, "OutputNodes.z", 2)
回答2:
I implemented similar code in C#, which loads in a model, reads a test image, does the appropriate cropping/scaling/etc, and runs the model. As Anton pointed out, the output does not match 100% to that of CNTK but is very close.
Code for image reading / cropping / scaling:
private static Bitmap ImCrop(Bitmap img, int col, int row, int numCols, int numRows)
{
var rect = new Rectangle(col, row, numCols, numRows);
return img.Clone(rect, System.Drawing.Imaging.PixelFormat.DontCare);
}
/// Returns a square sub-image from the center of the given image, with
/// a size that is cropRatio times the smallest image dimension. The
/// aspect ratio is preserved.
private static Bitmap ImCropToCenter(Bitmap img, double cropRatio)
{
var cropSize = (int)Math.Round(Math.Min(img.Height, img.Width) * cropRatio);
var startCol = (img.Width - cropSize) / 2;
var startRow = (img.Height - cropSize) / 2;
return ImCrop(img, startCol, startRow, cropSize, cropSize);
}
/// Creates a resized version of the present image. The returned image
/// will have the given width and height. This may distort the aspect ratio
/// of the image.
private static Bitmap ImResize(Bitmap img, int width, int height)
{
return new Bitmap(img, new Size(width, height));
}
Code for loading the model and the xml file which contains the pixel means:
public static IEvaluateModelManagedF loadModel(string modelPath, string outputLayerName)
{
var networkConfiguration = String.Format("modelPath=\"{0}\" outputNodeNames=\"{1}\"", modelPath, outputLayerName);
Stopwatch stopWatch = new Stopwatch();
var model = new IEvaluateModelManagedF();
model.CreateNetwork(networkConfiguration, deviceId: -1);
stopWatch.Stop();
Console.WriteLine("Time to create network: {0} ms.", stopWatch.ElapsedMilliseconds);
return model;
}
/// Read the xml mean file, i.e. the offsets which are substracted
/// from each pixel in an image before using it as input to a CNTK model.
public static float[] readXmlMeanFile(string XmlPath, int ImgWidth, int ImgHeight)
{
// Read and parse pixel value xml file
XmlTextReader reader = new XmlTextReader(XmlPath);
reader.ReadToFollowing("data");
reader.Read();
var pixelMeansXml =
reader.Value.Split(new[] { "\r", "\n", " " }, StringSplitOptions.RemoveEmptyEntries)
.Select(Single.Parse)
.ToArray();
// Re-order mean pixel values to be in the same order as the bitmap
// image (as outputted by the getRGBChannels() function).
int inputDim = 3 * ImgWidth * ImgHeight;
Debug.Assert(pixelMeansXml.Length == inputDim);
var pixelMeans = new float[inputDim];
int counter = 0;
for (int c = 0; c < 3; c++)
for (int h = 0; h < ImgHeight; h++)
for (int w = 0; w < ImgWidth; w++)
{
int xmlIndex = h * ImgWidth * 3 + w * 3 + c;
pixelMeans[counter++] = pixelMeansXml[xmlIndex];
}
return pixelMeans;
}
Code to load in an image and convert to model input:
/// Creates a list of CNTK feature values from a given bitmap.
/// The image is first resized to fit into an (targetSize x targetSize) bounding box,
/// then the image planes are converted to a CNTK tensor, and the mean
/// pixel value substracted. Returns a list with targetSize * targetSize * 3 floats.
private static List<float> ImageToFeatures(Bitmap img, int targetSize, float[] pixelMeans)
{
// Apply the same image pre-processing that is done typically in CNTK:
// Take a center crop of the image, then re-size it to the network input size.
var imgCropped = ImCropToCenter(img, 1.0);
var imgResized = ImResize(imgCropped, targetSize, targetSize);
// Convert pixels to CNTK model input.
// Fast pixel extraction is ~5 faster while giving identical output
var features = new float[3 * imgResized.Height * imgResized.Width];
var boFastPixelExtraction = true;
if (boFastPixelExtraction)
{
var pixelsRGB = ImGetRGBChannels(imgResized);
for (int c = 0; c < 3; c++)
{
byte[] pixels = pixelsRGB[2 - c];
Debug.Assert(pixels.Length == imgResized.Height * imgResized.Width);
for (int i = 0; i < pixels.Length; i++)
{
int featIndex = i + c * pixels.Length;
features[featIndex] = pixels[i] - pixelMeans[featIndex];
}
}
}
else
{
// Traverse the image in the format that is used in OpenCV:
// First the B plane, then the G plane, R plane
// Note: calling GetPixel(w, h) repeatedly is slow!
int featIndex = 0;
for (int c = 0; c < 3; c++)
for (int h = 0; h < imgResized.Height; h++)
for (int w = 0; w < imgResized.Width; w++)
{
var pixel = imgResized.GetPixel(w, h);
float v;
if (c == 0)
v = pixel.B;
else if (c == 1)
v = pixel.G;
else if (c == 2)
v = pixel.R;
else
throw new Exception("");
// Substract pixel mean
features[featIndex] = v - pixelMeans[featIndex];
featIndex++;
}
}
return features.ToList();
}
/// Convert bitmap image to R,G,B channel byte arrays.
/// See: http://stackoverflow.com/questions/6020406/travel-through-pixels-in-bmp
private static List<byte[]> ImGetRGBChannels(Bitmap bmp)
{
// Lock the bitmap's bits.
Rectangle rect = new Rectangle(0, 0, bmp.Width, bmp.Height);
BitmapData bmpData = bmp.LockBits(rect, ImageLockMode.ReadWrite, PixelFormat.Format24bppRgb);
// Declare an array to hold the bytes of the bitmap.
int bytes = bmpData.Stride * bmp.Height;
byte[] rgbValues = new byte[bytes];
byte[] r = new byte[bytes / 3];
byte[] g = new byte[bytes / 3];
byte[] b = new byte[bytes / 3];
// Copy the RGB values into the array, starting from ptr to the first line
IntPtr ptr = bmpData.Scan0;
Marshal.Copy(ptr, rgbValues, 0, bytes);
// Populate byte arrays
int count = 0;
int stride = bmpData.Stride;
for (int col = 0; col < bmpData.Height; col++)
{
for (int row = 0; row < bmpData.Width; row++)
{
int offset = (col * stride) + (row * 3);
b[count] = rgbValues[offset];
g[count] = rgbValues[offset + 1];
r[count++] = rgbValues[offset + 2];
}
}
bmp.UnlockBits(bmpData);
return new List<byte[]> { r, g, b };
}
来源:https://stackoverflow.com/questions/37300946/how-to-use-rgb-image-as-input-for-the-c-sharp-evaldll-wrapper