Caffe Multiple Input Images

后端 未结 2 1290
予麋鹿
予麋鹿 2021-02-01 23:30

I\'m looking at implementing a Caffe CNN which accepts two input images and a label (later perhaps other data) and was wondering if anyone was aware of the correct syntax in the

2条回答
  •  猫巷女王i
    2021-02-02 00:02

    Edit: I have been using the HDF5_DATA layer lately for this and it is definitely the way to go.

    HDF5 is a key value store, where each key is a string, and each value is a multi-dimensional array. Thus, to use the HDF5_DATA layer, just add a new key for each top you want to use, and set the value for that key to store the image you want to use. Writing these HDF5 files from python is easy:

    import h5py
    import numpy as np
    
    filelist = []
    for i in range(100):
        image1 = get_some_image(i)
        image2 = get_another_image(i)
        filename = '/tmp/my_hdf5%d.h5' % i
        with hypy.File(filename, 'w') as f:
            f['data1'] = np.transpose(image1, (2, 0, 1))
            f['data2'] = np.transpose(image2, (2, 0, 1))
        filelist.append(filename)
    with open('/tmp/filelist.txt', 'w') as f:
        for filename in filelist:
            f.write(filename + '\n')
    

    Then simply set the source of the HDF5_DATA param to be '/tmp/filelist.txt', and set the tops to be "data1" and "data2".

    I'm leaving the original response below:

    ====================================================

    There are two good ways of doing this. The easiest is probably to use two separate IMAGE_DATA layers, one with the first image and label, and a second with the second image. Caffe retrieves images from LMDB or LEVELDB, which are key value stores, and assuming you create your two databases with corresponding images having the same integer id key, Caffe will in fact load the images correctly, and you can proceed to construct your net with the data/labels of both layers.

    The problem with this approach is that having two data layers is not really very satisfying, and it doesn't scale very well if you want to do more advanced things like having non-integer labels for things like bounding boxes, etc. If you're prepared to make a time investment in this, you can do a better job by modifying the tools/convert_imageset.cpp file to stack images or other data across channels. For example you could create a datum with 6 channels - the first 3 for your first image's RGB, and the second 3 for your second image's RGB. After reading this in using the IMAGE_DATA layer, you can split the stream into two images using a SLICE layer with a slice_point at index 3 along the slice_dim = 1 dimension. If further down the road, you decide that you want to load even more complex assortments of data, you'll understand the encoding scheme and can write your own decoding layer based off of src/caffe/layers/data_layer.cpp to gain full control of the pipeline.

提交回复
热议问题