Automatic string length in recarray

前端未结

关注

 2  988

[愿得一人] 2021-01-13 23:06

If I create a recarray in this way:

In [29]: np.rec.fromrecords([(1,\'hello\'),(2,\'world\')],names=[\'a\',\'b\'])

The result looks fine:

2条回答

慢半拍i (楼主)

2021-01-14 00:06

If you don't need to manipulate the strings as bytes, you may use the object data-type to represent them. This essentially stores a pointer instead of the actual bytes:

In [38]: np.array(data, dtype=[('a', np.uint8), ('b', np.object)])
Out[38]: 
array([(1, 'hello'), (2, 'world')], 
      dtype=[('a', '|u1'), ('b', '|O8')])

Alternatively, Alex's idea would work well:

new_dt = []

# For each field of a given type and alignment, determine
# whether the field is an integer.  If so, represent it as a byte.

for f, (T, align) in dt.fields.iteritems():
    if np.issubdtype(T, int):
        new_dt.append((f, np.uint8))
    else:
        new_dt.append((f, T))

new_dt = np.dtype(new_dt)
np.array(data, dtype=new_dt)

which should yield

array([(1, 'hello'), (2, 'world')], 
      dtype=[('f0', '|u1'), ('f1', '|S5')])

0 讨论(0)

查看其它2个回答