How to set cache settings while using h5py high level interface?

前端 未结 3 1639
忘掉有多难
忘掉有多难 2020-12-17 04:36

I\'m trying to increase cache size for my HDF5 files, but it doesn\'t seem to be working. This is what I have:

import h5py

with h5py.File(\"test.h5\", \'w\'         


        
相关标签:
3条回答
  • 2020-12-17 04:56

    The h5py-cache project might be helpful, although I haven't used it:

    import h5py_cache
    with h5py_cache.File('test.h5', chunk_cache_mem_size=1024**3, 'a') as f:
    f.create_dataset(...)
    
    0 讨论(0)
  • 2020-12-17 05:02

    As of h5py version 2.9.0, this behavior is now available directly through the main h5py.File interface. There are three parameters that control the "raw data chunk cache" — rdcc_nbytes, rdcc_w0, and rdcc_nslots — which are documented here. The OP was trying to adjust the rdcc_nbytes setting, which can now simply be done as

    import h5py
    
    with h5py.File("test.h5", "w", rdcc_nbytes=5242880) as fid:
        # Use fid for something here
    

    The only difference is that you have to know how much space you actually need, rather than just multiplying by 5 as the OP wanted. The current default values are the same as the OP found. Of course, if you really wanted to do this programatically, you could just open it once, get the cache, close it, and then reopen with the desired parameters.

    0 讨论(0)
  • 2020-12-17 05:18

    If you are using h5py version 2.9.0 or newer, see Mike's answer.


    According to the docs, get_access_plist() returns a copy of the file access property list. So it is not surprising that modifying the copy does not affect the original.

    It appears the high-level interface does not provide a way to change the cache settings.

    Here is how you could do it using the low-level interface.

    propfaid = h5py.h5p.create(h5py.h5p.FILE_ACCESS)
    settings = list(propfaid.get_cache())
    print(settings)
    # [0, 521, 1048576, 0.75]
    
    settings[2] *= 5
    propfaid.set_cache(*settings)
    settings = propfaid.get_cache()
    print(settings)
    # (0, 521, 5242880, 0.75)
    

    The above creates a PropFAID. We can then open the file and get a FileID this way:

    import contextlib
    with contextlib.closing(h5py.h5f.open(
                            filename, flags=h5py.h5f.ACC_RDWR, fapl=propfaid)) as fid:
        # <h5py.h5f.FileID object at 0x9abc694>
        settings = list(fid.get_access_plist().get_cache())
        print(settings)
        # [0, 521, 5242880, 0.75]
    

    And we can use the fid to open the file with the high-level interface by passing fid to h5py.File:

        f = h5py.File(fid)
        print(f.id.get_access_plist().get_cache())
        # (0, 521, 5242880, 0.75)
    

    Thus, you can still use the high-level interface, but it takes some fiddling to get there. On the other hand, if you distill it to just the essentials, perhaps it isn't so bad:

    import h5py
    import contextlib
    
    filename = '/tmp/foo.hdf5'
    propfaid = h5py.h5p.create(h5py.h5p.FILE_ACCESS)
    settings = list(propfaid.get_cache())
    settings[2] *= 5
    propfaid.set_cache(*settings)
    with contextlib.closing(h5py.h5f.open(filename, fapl=propfaid)) as fid:
        f = h5py.File(fid)
    
    0 讨论(0)
提交回复
热议问题