Determining signed state for HDF5 variables in NetCDF

问题

My team has been given HDF5 files to read. They contain structured data with unsigned variables. I and my team were overjoyed to find the NetCDF library, which allows pure-Java reading of HDF5 files, albeit using the NetCDF data model.

No problem---we thought we'd just translate from the NetCDF data model to whatever model we wanted. As long as we get the data out. Then we tried to read an unsigned 32-bit integer from the HDF5 file. We can load up HDFView 2.9 and see that the variable is an unsigned 32-bit integer. But... it turns out that NetCDF-3 doesn't support unsigned values!

To add insult to injury, NetCDF-3 recommends that you "widen the data type" or use an _Unsigned = "true" attribute (I am not making this up) to indicate that the 32 bits should be treated as an unsigned value.

Well, maybe those kludges would be effective if I were creating NetCDF data from scratch, but how can I detect using NetCDF that a 32-bit value in an existing HDF5 file should be interpreted as unsigned?

Update: Apparently NetCDF-4 does support unsigned data types. So this begs the question: How can I determine whether a value is signed or unsigned from the NetCDF Java library?" I don't see any unsigned types in ucar.ma2.DataType.

回答1:

Yes, you can look for _Unsigned = "true" attribute, or you can call Variable.isUnsigned().

Because Java doesnt support unsigned types, it was a difficult design decision. Ultimately we decided not to automatically widen the type, for efficiency. So the application must check and do the right thing. Look at ucar.nc2.DataType.unsignedXXX() helper methods.

When you read the data, you get an Array object. you can call Array.isUnsigned(). Also the extractors like Array.getDouble() will convert correctly.

The netCDF-Java library supports an extended data model called the "Common data Model" to abstract out differences in file formats. So we are not stuck with the limits of the netCDF-3 file format or data model. But we are in Java

John

回答2:

Given the fact that Java doesnt have unsigned types, I think the only options are to 1) automatically widen unsigned data (turn bytes into shorts, shorts into ins, ints into longs), or 2) represent both signed and unsigned integers with the available Java data types, and let the user decide if/when it should be widened.

Arguably the main use for unsigned data is to represent bits, and in that case conversion would be a waste, since you will just mask and test the bits.

The other main use is for eg satellite data which often uses unsigned bytes, and there again I think not automatically widening is the right choice. What you end up doing is just widening right at the point you use it.

回答3:

It seems that when the CDM data types are mapped to Java, NetCDF will automatically add the attribute _Unsigned = "true" to the variable. So I assume that if I check for that attribute, it will indicate if the value is unsigned or not. This may be exactly what I was looking for; I'll verify tomorrow that it works.

Update: I tried this and it works; moreover, as John Caron indicated in the accepted answer, a NetCDF array has an isUnsigned() method which checks for the _Unsigned attribute.

来源：https://stackoverflow.com/questions/16309991/determining-signed-state-for-hdf5-variables-in-netcdf

标签

unsigned

hdf5

netcdf