numpy custom dtype challenge

问题

I have an array of custom dtype my_type which I successfully read from a binary file. The custom dtype has a header section after that comes the data. The data part are np.int16 numbers, so the custom dtype looks like this:

header, imaginary, real, imaginary, real,  ..., imaginary, real

Now I am looking for a smart way to use Numpy's view to get an array of np.complex64 of only data without copying/looping etc. considering the following facts:

the header part should be ignored
somehow correct the order (i.e. first real, imaginary)
the resulting array should be complex64 not complex32!

That is, from an array of custom dtype:

[my_type, my_type, ..., my_type]

I like to get a much larger array containing:

[complex64, complex64, ..., complex64]

Is it possible to do this in one go using Numpy's view?

UPDATE:

So the solution is copying in memory. Many thanks to the answers below. But because the annoying header appears before every data frame, it seems that in spite of copying in the memory, a loop over all data frames is still necessary. In a schematic manner I have:

a = np.arange(10, dtype=np.float16)
skip_annoying_header = 2
r = np.zeros(a.size - skip_annoying_header, np.float16)
r[0::2], r[1::2] = a[skip_annoying_header + 1::2], a[skip_annoying_header::2]
r = r.astype(np.float32)
r = r.view(np.complex64)

And I do this in a for loop for every data frame, and then at the end of the for loop, I copy again the content of r into big array.

Can this looping be somehow eliminated?

回答1:

All 3 requirements conflict with a view.

Ignoring the header field requires selecting the other fields. Selecting a single field is clearly a view, but the state of multiple fields is in flux. When I try anything besides simply viewing the values I get a warning:

In [497]: dt=np.dtype('U10,f,f,f,f')
In [498]: x=np.zeros((5,),dt)

In [505]: x[['f1','f3']].__array_interface__
/usr/bin/ipython3:1: FutureWarning: Numpy has detected that you (may be) writing to an array returned
by numpy.diagonal or by selecting multiple fields in a record
array. This code will likely break in a future numpy release --
see numpy.diagonal or arrays.indexing reference docs for details.
The quick fix is to make an explicit copy (e.g., do
arr.diagonal().copy() or arr[['f0','f1']].copy()).

Remember, the data is layed out element by element, with the dtype tuple values in compact blocks - essentially a compact version of the display. Ignoring the header requires skipping that set of bytes. view can handle skips produced by strides, but not these dtype field skips.

In [533]: x
Out[533]: 
array([('header', 0.0, 5.0, 1.0, 10.0), ('header', 1.0, 4.0, 1.0, 10.0),
       ('header', 2.0, 3.0, 1.0, 10.0), ('header', 3.0, 2.0, 1.0, 10.0),
       ('header', 4.0, 1.0, 1.0, 10.0)], 
      dtype=[('f0', '<U10'), ('f1', '<f4'), ('f2', '<f4'), ('f3', '<f4'), ('f4', '<f4')])

To explore reordering the complex fields, lets try a 2d array:

In [509]: y=np.arange(10.).reshape(5,2)  # 2 column float
In [510]: y.view(complex)    # can be viewed as complex
Out[510]: 
array([[ 0.+1.j],
       [ 2.+3.j],
       [ 4.+5.j],
       [ 6.+7.j],
       [ 8.+9.j]])
In [511]: y[:,::-1].view(complex)
...
ValueError: new type not compatible with array.

To switch the real/imaginay columns I have to make a copy. complex requires that the 2 floats be contiguous and in order.

In [512]: y[:,::-1].copy().view(complex)
Out[512]: 
array([[ 1.+0.j],
       [ 3.+2.j],
       [ 5.+4.j],
       [ 7.+6.j],
       [ 9.+8.j]])

float32 to float64 is clearly not a view change. One uses 4 bytes per number, the other 8. You can't 'view' 4 as 8 without copying.

回答2:

@hpaulj is absolutely correct that this conflicts with a view.

However, you may be asking the wrong question.

numpy can certainly do what you're wanting to do but you'll need to make a temporary copy in memory.

Overall, you're probably better served by rethinking the "read the entire file into memory and then view it" approach. Instead seek past (or read in) the header, then read in the data portion with fromfile. After than, it's relatively straightforward to manipulate things into what you want, as long as you don't mind making a copy to go from float32's to float64's.

To start out with, let's generate a file similar to yours:

import numpy as np

reals = np.arange(100).astype(np.float32)
imag = -9999.0 * np.ones(100).astype(np.float32)

data = np.empty(reals.size + imag.size, dtype=np.float32)
data[::2], data[1::2] = imag, reals

with open('temp.dat', 'wb') as outfile:
    # Write a 1Kb header (of literal "x"'s, in this case)
    outfile.write(1024 * 'x')
    outfile.write(data)

Now we'll read it in.

The key to ignoring the header is to seek past it before reading the data in with fromfile.

Then, we can de-interleave the data and convert to 64-bit floats at the same time.

Finally, you can then view the resulting 2xN-length float64 array as an N-length complex128 array. (Note: complex128 is the 64-bit version of a complex number. complex64 is the 32-bit version.)

For example:

import numpy as np

with open('temp.dat', 'rb') as infile:
    # Seek past header
    infile.seek(1024)

    # Read in rest of file as float32's
    data = np.fromfile(infile, dtype=np.float32)

result = np.empty(data.size, np.float64)

# De-interleave imag & real back into expected real & imag, converting to 64-bit
result[::2], result[1::2] = data[1::2], data[::2]

# View the result as complex128's (i.e. 64-bit complex numbers)
result = result.view(np.complex128)

来源：https://stackoverflow.com/questions/32764185/numpy-custom-dtype-challenge

标签

python

arrays

numpy

binary

complex-numbers