How to use RGB Image as input for the C# EvalDll Wrapper?

问题

I trained a network using the provided ImageReader and now, I'm trying to use the CNTK EvalDll in a C# project to evaluate RGB Images.

I've seen examples related to the EvalDll, but the input is always an array of float/double, never images.

How can I use the exposed interface to use the trained network with an RGB image ?

回答1:

I'll assume that you'll want the equivalent of reading with the ImageReader, where your reader config looks something like

features=[
        width=224
        height=224
        channels=3
        cropType=Center
]

You'll need helper functions to create the crop, and to re-size the image to the size accepted by the network.

I'll define 2 extension methods of System.Drawing.Bitmap, one to crop, and one to re-size:

open System.Collections.Generic
open System.Drawing
open System.Drawing.Drawing2D
open System.Drawing.Imaging
type Bitmap with
    /// Crops the image in the present object, starting at the given (column, row), and retaining
    /// the given number of columns and rows.
    member this.Crop(column, row, numCols, numRows) = 
        let rect = Rectangle(column, row, numCols, numRows)
        this.Clone(rect, this.PixelFormat)
    /// Creates a resized version of the present image. The returned image
    /// will have the given width and height. This may distort the aspect ratio
    /// of the image.
    member this.ResizeImage(width, height, useHighQuality) =
        // Rather than using image.GetThumbnailImage, use direct image resizing.
        // GetThumbnailImage throws odd out-of-memory exceptions on some 
        // images, see also 
        // http://stackoverflow.com/questions/27528057/c-sharp-out-of-memory-exception-in-getthumbnailimage-on-a-server
        // Use the interpolation method suggested on 
        // http://stackoverflow.com/questions/1922040/resize-an-image-c-sharp
        let rect = Rectangle(0, 0, width, height);
        let destImage = new Bitmap(width, height);
        destImage.SetResolution(this.HorizontalResolution, this.VerticalResolution);
        use graphics = Graphics.FromImage destImage
        graphics.CompositingMode <- CompositingMode.SourceCopy;
        if useHighQuality then
            graphics.InterpolationMode <- InterpolationMode.HighQualityBicubic
            graphics.CompositingQuality <- CompositingQuality.HighQuality
            graphics.SmoothingMode <- SmoothingMode.HighQuality
            graphics.PixelOffsetMode <- PixelOffsetMode.HighQuality
        else
            graphics.InterpolationMode <- InterpolationMode.Low
        use wrapMode = new ImageAttributes()
        wrapMode.SetWrapMode WrapMode.TileFlipXY
        graphics.DrawImage(this, rect, 0, 0, this.Width,this.Height, GraphicsUnit.Pixel, wrapMode)
        destImage

Based on that, define a function to do the center crop:

/// Returns a square sub-image from the center of the given image, with
/// a size that is cropRatio times the smallest image dimension. The 
/// aspect ratio is preserved.
let CenterCrop cropRatio (image: Bitmap) =
    let cropSize = 
        float(min image.Height image.Width) * cropRatio
        |> int
    let startRow = (image.Height - cropSize) / 2
    let startCol = (image.Width - cropSize) / 2
    image.Crop(startCol, startRow, cropSize, cropSize)

Then plug it all together: crop, resize, then traverse the image in the plane order that OpenCV uses:

/// Creates a list of CNTK feature values from a given bitmap.
/// The image is first resized to fit into an (targetSize x targetSize) bounding box,
/// then the image planes are converted to a CNTK tensor.
/// Returns a list with targetSize*targetSize*3 values.
let ImageToFeatures (image: Bitmap, targetSize) =
    // Apply the same image pre-processing that is typically done
    // in CNTK when running it in test or write mode: Take a center
    // crop of the image, then re-size it to the network input size.
    let cropped = CenterCrop 1.0 image
    let resized = cropped.ResizeImage(targetSize, targetSize, false)
    // Ensure that the initial capacity of the list is provided 
    // with the constructor. Creating the list via the default constructor
    // makes the whole operation 20% slower.
    let features = List (targetSize * targetSize * 3)
    // Traverse the image in the format that is used in OpenCV:
    // First the B plane, then the G plane, R plane
    for c in 0 .. 2 do
        for h in 0 .. (resized.Height - 1) do
            for w in 0 .. (resized.Width - 1) do
                let pixel = resized.GetPixel(w, h)
                let v = 
                    match c with 
                    | 0 -> pixel.B
                    | 1 -> pixel.G
                    | 2 -> pixel.R
                    | _ -> failwith "No such channel"
                    |> float32
                features.Add v
    features

Call ImageToFeatures with the image in question, feed the result into an instance of IEvaluateModelManagedF, and you're good. I'm assuming your RGB image comes in myImage, and you're doing binary classification with a network size of 224 x 224.

let LoadModelOnCpu modelPath =
    let model = new IEvaluateModelManagedF()
    let description = sprintf "deviceId=-1\r\nmodelPath=\"%s\"" modelPath
    model.Init description
    model.CreateNetwork description
    model
let model = LoadModelOnCpu("myModelFile")
let featureDict = Dictionary()
featureDict.["features"] <- ImageToFeatures(myImage, 224)
model.Evaluate(featureDict, "OutputNodes.z", 2)

回答2:

I implemented similar code in C#, which loads in a model, reads a test image, does the appropriate cropping/scaling/etc, and runs the model. As Anton pointed out, the output does not match 100% to that of CNTK but is very close.

Code for image reading / cropping / scaling:

    private static Bitmap ImCrop(Bitmap img, int col, int row, int numCols, int numRows)
    {
        var rect = new Rectangle(col, row, numCols, numRows);
        return img.Clone(rect, System.Drawing.Imaging.PixelFormat.DontCare);
    }

    /// Returns a square sub-image from the center of the given image, with
    /// a size that is cropRatio times the smallest image dimension. The 
    /// aspect ratio is preserved.
    private static Bitmap ImCropToCenter(Bitmap img, double cropRatio)
    {
        var cropSize = (int)Math.Round(Math.Min(img.Height, img.Width) * cropRatio);
        var startCol = (img.Width - cropSize) / 2;
        var startRow = (img.Height - cropSize) / 2;
        return ImCrop(img, startCol, startRow, cropSize, cropSize);
    }

    /// Creates a resized version of the present image. The returned image
    /// will have the given width and height. This may distort the aspect ratio
    /// of the image.
    private static Bitmap ImResize(Bitmap img, int width, int height)
    {
        return new Bitmap(img, new Size(width, height));
    }

Code for loading the model and the xml file which contains the pixel means:

    public static IEvaluateModelManagedF loadModel(string modelPath, string outputLayerName)
    {
        var networkConfiguration = String.Format("modelPath=\"{0}\" outputNodeNames=\"{1}\"", modelPath, outputLayerName);
        Stopwatch stopWatch = new Stopwatch();
        var model = new IEvaluateModelManagedF();
        model.CreateNetwork(networkConfiguration, deviceId: -1);
        stopWatch.Stop();
        Console.WriteLine("Time to create network: {0} ms.", stopWatch.ElapsedMilliseconds);
        return model;
    }

    /// Read the xml mean file, i.e. the offsets which are substracted
    /// from each pixel in an image before using it as input to a CNTK model.
    public static float[] readXmlMeanFile(string XmlPath, int ImgWidth, int ImgHeight)
    {
        // Read and parse pixel value xml file
        XmlTextReader reader = new XmlTextReader(XmlPath);
        reader.ReadToFollowing("data");
        reader.Read();
        var pixelMeansXml =
            reader.Value.Split(new[] { "\r", "\n", " " }, StringSplitOptions.RemoveEmptyEntries)
                .Select(Single.Parse)
                .ToArray();

        // Re-order mean pixel values to be in the same order as the bitmap
        // image (as outputted by the getRGBChannels() function).
        int inputDim = 3 * ImgWidth * ImgHeight;
        Debug.Assert(pixelMeansXml.Length == inputDim);
        var pixelMeans = new float[inputDim];
        int counter = 0;
        for (int c = 0; c < 3; c++)
            for (int h = 0; h < ImgHeight; h++)
                for (int w = 0; w < ImgWidth; w++)
                {
                    int xmlIndex = h * ImgWidth * 3 + w * 3 + c;
                    pixelMeans[counter++] = pixelMeansXml[xmlIndex];
                }
        return pixelMeans;
    }

Code to load in an image and convert to model input:

    /// Creates a list of CNTK feature values from a given bitmap.
    /// The image is first resized to fit into an (targetSize x targetSize) bounding box,
    /// then the image planes are converted to a CNTK tensor, and the mean 
    /// pixel value substracted. Returns a list with targetSize * targetSize * 3 floats.
    private static List<float> ImageToFeatures(Bitmap img, int targetSize, float[] pixelMeans)
    {
        // Apply the same image pre-processing that is done typically in CNTK:
        // Take a center crop of the image, then re-size it to the network input size.
        var imgCropped = ImCropToCenter(img, 1.0);
        var imgResized = ImResize(imgCropped, targetSize, targetSize);

        // Convert pixels to CNTK model input.
        // Fast pixel extraction is ~5 faster while giving identical output
        var features = new float[3 * imgResized.Height * imgResized.Width];
        var boFastPixelExtraction = true; 
        if (boFastPixelExtraction) 
        {
            var pixelsRGB = ImGetRGBChannels(imgResized);
            for (int c = 0; c < 3; c++)
            {
                byte[] pixels = pixelsRGB[2 - c];
                Debug.Assert(pixels.Length == imgResized.Height * imgResized.Width);
                for (int i = 0; i < pixels.Length; i++)
                {
                    int featIndex = i + c * pixels.Length;
                    features[featIndex] = pixels[i] - pixelMeans[featIndex];
                }
            }
        }
        else
        {
            // Traverse the image in the format that is used in OpenCV:
            // First the B plane, then the G plane, R plane
            // Note: calling GetPixel(w, h) repeatedly is slow!
            int featIndex = 0;
            for (int c = 0; c < 3; c++)
                for (int h = 0; h < imgResized.Height; h++)
                    for (int w = 0; w < imgResized.Width; w++)
                    {
                        var pixel = imgResized.GetPixel(w, h);
                        float v;
                        if (c == 0)
                            v = pixel.B;
                        else if (c == 1)
                            v = pixel.G;
                        else if (c == 2)
                            v = pixel.R;
                        else
                            throw new Exception("");

                        // Substract pixel mean                                                                                           
                        features[featIndex] = v - pixelMeans[featIndex];
                        featIndex++;
                    }
        }  
        return features.ToList();
    }

    /// Convert bitmap image to R,G,B channel byte arrays.
    /// See: http://stackoverflow.com/questions/6020406/travel-through-pixels-in-bmp
    private static List<byte[]> ImGetRGBChannels(Bitmap bmp)
    {
        // Lock the bitmap's bits.  
        Rectangle rect = new Rectangle(0, 0, bmp.Width, bmp.Height);
        BitmapData bmpData = bmp.LockBits(rect, ImageLockMode.ReadWrite, PixelFormat.Format24bppRgb);

        // Declare an array to hold the bytes of the bitmap.
        int bytes = bmpData.Stride * bmp.Height;
        byte[] rgbValues = new byte[bytes];
        byte[] r = new byte[bytes / 3];
        byte[] g = new byte[bytes / 3];
        byte[] b = new byte[bytes / 3];

        // Copy the RGB values into the array, starting from ptr to the first line
        IntPtr ptr = bmpData.Scan0;
        Marshal.Copy(ptr, rgbValues, 0, bytes);

        // Populate byte arrays
        int count = 0;
        int stride = bmpData.Stride;
        for (int col = 0; col < bmpData.Height; col++)
        {
            for (int row = 0; row < bmpData.Width; row++)
            {
                int offset = (col * stride) + (row * 3);
                b[count] = rgbValues[offset];
                g[count] = rgbValues[offset + 1];
                r[count++] = rgbValues[offset + 2];
            }
        }
        bmp.UnlockBits(bmpData);
        return new List<byte[]> { r, g, b };
    }

来源：https://stackoverflow.com/questions/37300946/how-to-use-rgb-image-as-input-for-the-c-sharp-evaldll-wrapper

标签

cntk