Tensorflow: How to encode and read bmp images?

前端 未结 2 446
半阙折子戏
半阙折子戏 2021-01-24 02:32

I am trying to read .bmp images, do some augmentation on these, save them to a .tfrecords file and then open the .tfrecords files and use the images for image classification. I

相关标签:
2条回答
  • 2021-01-24 03:33

    There is no encode_bmp in the tensorflow main package, but if you import tensorflow_io (also a Google officially supported package) you can find the encode_bmp method there.

    For the documentation see: https://www.tensorflow.org/io/api_docs/python/tfio/image/encode_bmp

    0 讨论(0)
  • 2021-01-24 03:38

    in your comment, surely you mean encode instead of encrypt

    The BMP file format is quite simplistic, consisting of a bunch of headers and pretty much raw pixel data. This is why BMP images are so big. I suppose this is also why TensorFlow developers did not bother to write a function to encode arrays (representing images) into this format. Few people still use it. It is recommended to use PNG instead, which performs lossless compression of the image. Or, if you can deal with lossy compression, use JPG.

    TensorFlow doesn't do anything special for encoding images. It just returns the bytes that represent the image in that format, similar to what matplotlib does when you do save_fig (except MPL also writes the bytes to a file).

    Suppose you produce a numpy array where the top rows are 0 and the bottom rows are 255. This is an array of numbers which, if you think it as a picture, would represent 2 horizontal bands, the top one black and the bottom one white.

    If you want to see this picture in another program (GIMP) you need to encode this information in a standard format, such as PNG. Encoding means adding some headers and metadata and, optionally, compressing the data.


    Now that it is a bit more clear what encoding is, I recommend you work with PNG images.

    with tf.gfile.FastGFile('image.png', 'rb') as f:
        # get the bytes representing the image
        # this is a 1D array (string) which includes header and stuff
        raw_png = f.read()
    
        # decode the raw representation into an array
        # so we have 2D array representing the image (3D if colour) 
        image = tf.image.decode_png(raw_png)
    
        # augment the image using e.g.
        augmented_img = tf.image.random_brightness(image)
    
        # convert the array back into a compressed representation
        # by encoding it into png
        # we now end up with a string again
        augmented_png = tf.image.encode_png(augmented_img, compression=9) 
    
        # Write augmented_png to file using tf.train.Example
        writer = tf.python_io.TFRecordWriter(<output_tfrecords_filename>)
        example = tf.train.Example(features=tf.train.Features(feature={
            'encoded_img': tf.train.Feature(bytes_list=tf.train.BytesList(value=[augmented_png])}))
        writer.write(example.SerializeToString())
    
        # Read img from file
        dataset = tf.data.TFRecordDataset(<img_file>)
        dataset = dataset.map(parse_img_fn)
    

    There are a few important pieces of advice:

    • don't use numpy.tostring. This returns a HUUGE representation because each pixel is represented as a float, and they are all concatenated. No compression, nothing. Try and check the file size :)

    • no need to pass back into python by using tf.Session. You can perform all the ops on TF side. This way you have an input graph which you can reuse as part of an input pipeline.

    0 讨论(0)
提交回复
热议问题