out of core 4D image tif storage as hdf5 python

后端未结

关注

 1  2041

走了就别回头了 2021-01-14 16:24

I have 27GB of 2D tiff files that represent slices of a movie of 3D images. I want to be able to slice this data as if it were a simple numpy4d array. It looks like dask.arr

1条回答

梦毁少年i (楼主)

2021-01-14 17:11
Edit: Use dask.array's imread function

As of dask 0.7.0 you don't need to store your images in HDF5. Use the imread function directly instead:
```
In [1]: from skimage.io import imread

In [2]: im = imread('foo.1.tiff')

In [3]: im.shape
Out[3]: (5, 5, 3)

In [4]: ls foo.*.tiff
foo.1.tiff  foo.2.tiff  foo.3.tiff  foo.4.tiff

In [5]: from dask.array.image import imread

In [6]: im = imread('foo.*.tiff')

In [7]: im.shape
Out[7]: (4, 5, 5, 3)
```
Older answer that stores images into HDF5

Data ingest is often the trickiest of problems. Dask.array doesn't have any automatic integration with image files (though this is quite doable if there's sufficient interest.) Fortunately moving data to h5py is easy because h5py supports the numpy slicing syntax. In the following example we'll create an empty h5py Dataset, and then store four tiny tiff files into that dataset in a for loop.

First we get filenames for our images (please forgive the toy dataset. I don't have anything realistic lying around.)
```
In [1]: from glob import glob
In [2]: filenames = sorted(glob('foo.*.tiff'))
In [3]: filenames
Out[3]: ['foo.1.tiff', 'foo.2.tiff', 'foo.3.tiff', 'foo.4.tiff']
```
Load in and inspect a sample image
```
In [4]: from skimage.io import imread
In [5]: im = imread(filenames[0])  # a sample image
In [6]: im.shape  # tiny image
Out[6]: (5, 5, 3)
In [7]: im.dtype
Out[7]: dtype('int8')
```
Now we'll make an HDF5 file and an HDF5 dataset called '/x' within that file.
```
In [8]: import h5py
In [9]: f = h5py.File('myfile.hdf5')  # make an hdf5 file
In [10]: out = f.require_dataset('/x', shape=(len(filenames), 5, 5, 3), dtype=im.dtype)
```
Great, now we can insert our images one at a time into the HDF5 dataset.
```
In [11]: for i, fn in enumerate(filenames):
   ....:     im = imread(fn)
   ....:     out[i, :, :, :] = im
```
At this point dask.array can wrap out happily
```
In [12]: import dask.array as da
In [13]: x = da.from_array(out, chunks=(1, 5, 5, 3))  # treat each image as a single chunk
In [14]: x[::2, :, :, 0].mean()
Out[14]: dask.array
```
If you'd like to see more native support for stacks of images then I encourage you to raise an issue. It would be pretty easy to use dask.array off of your stack of tiff files directly without going through HDF5.
0 讨论(0)
发布评论:

提交评论
- 加载中...

out of core 4D image tif storage as hdf5 python

Edit: Use dask.array's imread function

Older answer that stores images into HDF5

Edit: Use `dask.array`'s `imread` function