out of core 4D image tif storage as hdf5 python

后端 未结 1 2041
走了就别回头了
走了就别回头了 2021-01-14 16:24

I have 27GB of 2D tiff files that represent slices of a movie of 3D images. I want to be able to slice this data as if it were a simple numpy4d array. It looks like dask.arr

1条回答
  •  梦毁少年i
    2021-01-14 17:11

    Edit: Use dask.array's imread function

    As of dask 0.7.0 you don't need to store your images in HDF5. Use the imread function directly instead:

    In [1]: from skimage.io import imread
    
    In [2]: im = imread('foo.1.tiff')
    
    In [3]: im.shape
    Out[3]: (5, 5, 3)
    
    In [4]: ls foo.*.tiff
    foo.1.tiff  foo.2.tiff  foo.3.tiff  foo.4.tiff
    
    In [5]: from dask.array.image import imread
    
    In [6]: im = imread('foo.*.tiff')
    
    In [7]: im.shape
    Out[7]: (4, 5, 5, 3)
    

    Older answer that stores images into HDF5

    Data ingest is often the trickiest of problems. Dask.array doesn't have any automatic integration with image files (though this is quite doable if there's sufficient interest.) Fortunately moving data to h5py is easy because h5py supports the numpy slicing syntax. In the following example we'll create an empty h5py Dataset, and then store four tiny tiff files into that dataset in a for loop.

    First we get filenames for our images (please forgive the toy dataset. I don't have anything realistic lying around.)

    In [1]: from glob import glob
    In [2]: filenames = sorted(glob('foo.*.tiff'))
    In [3]: filenames
    Out[3]: ['foo.1.tiff', 'foo.2.tiff', 'foo.3.tiff', 'foo.4.tiff']
    

    Load in and inspect a sample image

    In [4]: from skimage.io import imread
    In [5]: im = imread(filenames[0])  # a sample image
    In [6]: im.shape  # tiny image
    Out[6]: (5, 5, 3)
    In [7]: im.dtype
    Out[7]: dtype('int8')
    

    Now we'll make an HDF5 file and an HDF5 dataset called '/x' within that file.

    In [8]: import h5py
    In [9]: f = h5py.File('myfile.hdf5')  # make an hdf5 file
    In [10]: out = f.require_dataset('/x', shape=(len(filenames), 5, 5, 3), dtype=im.dtype)
    

    Great, now we can insert our images one at a time into the HDF5 dataset.

    In [11]: for i, fn in enumerate(filenames):
       ....:     im = imread(fn)
       ....:     out[i, :, :, :] = im
    

    At this point dask.array can wrap out happily

    In [12]: import dask.array as da
    In [13]: x = da.from_array(out, chunks=(1, 5, 5, 3))  # treat each image as a single chunk
    In [14]: x[::2, :, :, 0].mean()
    Out[14]: dask.array
    

    If you'd like to see more native support for stacks of images then I encourage you to raise an issue. It would be pretty easy to use dask.array off of your stack of tiff files directly without going through HDF5.

    0 讨论(0)
提交回复
热议问题