How is HDF5 different from a folder with files?

后端 未结 9 1835
予麋鹿
予麋鹿 2021-01-29 22:43

I\'m working on an open source project dealing with adding metadata to folders. The provided (Python) API lets you browse and access metadata like it was just another folder. Be

9条回答
  •  面向向阳花
    2021-01-29 22:56

    HDF5 is ultimately, a format to store numbers, optimised for large datasets. The main strengths are the support for compression (that can make reading and writing data faster in many circumstances) and the fast in-kernel queries (retrieval of data fulfilling certain conditions, for example, all the values of pressure when the temperature was over 30 C).

    The fact that you can combine several datasets in the same file is just a convenience. For example, you could have several groups corresponding to different weather stations, and each group consisting on several tables of data. For each group you would have a set of attributes describing the details of the instruments, and each table the individual settings. You can have one h5 file for each block of data, with an attribute in the corresponding place and it would give you the same functionality. But now, what you can do with HDF5 is to repack the file for optimized querying, compress the whole thing slightly, and retrieve your information at a blazing speed. If you have several files, each one would be individually compressed, and the OS would decide the layout on disk, that may not be the optimal.

    One last thing HDF5 allows you is to load a file (or a piece) in memory exposing the same API as in disk. So, for example, you could use one or other backend depending on the size of the data and the available RAM. In your case, that would be equivalent as copying the relevant information to /dev/shm in Linux, and you would be responsible for commiting back to disk any modification.

提交回复
热议问题