Automatic string length in recarray

前端 未结 2 988
[愿得一人]
[愿得一人] 2021-01-13 23:06

If I create a recarray in this way:

In [29]: np.rec.fromrecords([(1,\'hello\'),(2,\'world\')],names=[\'a\',\'b\'])

The result looks fine:

2条回答
  •  慢半拍i
    慢半拍i (楼主)
    2021-01-14 00:06

    If you don't need to manipulate the strings as bytes, you may use the object data-type to represent them. This essentially stores a pointer instead of the actual bytes:

    In [38]: np.array(data, dtype=[('a', np.uint8), ('b', np.object)])
    Out[38]: 
    array([(1, 'hello'), (2, 'world')], 
          dtype=[('a', '|u1'), ('b', '|O8')])
    

    Alternatively, Alex's idea would work well:

    new_dt = []
    
    # For each field of a given type and alignment, determine
    # whether the field is an integer.  If so, represent it as a byte.
    
    for f, (T, align) in dt.fields.iteritems():
        if np.issubdtype(T, int):
            new_dt.append((f, np.uint8))
        else:
            new_dt.append((f, T))
    
    new_dt = np.dtype(new_dt)
    np.array(data, dtype=new_dt)
    

    which should yield

    array([(1, 'hello'), (2, 'world')], 
          dtype=[('f0', '|u1'), ('f1', '|S5')])
    

提交回复
热议问题