Caffe support LMDB data layer and ImageDataLayer. Create LMDB database from some dataset require some time and a lot of space. In contrast, ImageDataLayer only use a txt fi
LMDB is designed for faster fetching of data from a given key value
. Also the data is stored in uncompressed format, which makes it easy for the machine to just read the data and directly pass them to the GPU for processing.
In ImageDataLayer, we have to read the image details from the text file, and use OpenCV to read the image to memory. This uncompressing of image is computationally expensive.
But the best performance may not always be for the LMDB layer, it depends heavily on the configuration of the machine. Consider an example of 256 image batch size and the images of size 227x227x3. Also consider than you are using a very good GPU and a high end i8 processor machine. Here single image in LMDB format may occupy 151KB. A whole batch may occupy 37MB. If the GPU is able to perform 10 batches a second, the harddisk should have a speed of reading 370MB/s. If you are using a normal SATA or external harddisk, there will be bottlenecks on reading such large chunks of data due to the limits of the hard disk.
If caffe could not fetch data in the required speed, the bottleneck slows the whole training process even worse. At the same time, if you were reading 256 images and use multi-core version of OpenCV, the data prefetching may be handled more effectively than reading an LMDB.
The above case will not occur if you have stored the LMDB data on a SSD though!
Yes, the speed difference is indeed big. LMDB is optimized for high speed batch processing.