Automatic string length in recarray

前端 未结 2 986
[愿得一人]
[愿得一人] 2021-01-13 23:06

If I create a recarray in this way:

In [29]: np.rec.fromrecords([(1,\'hello\'),(2,\'world\')],names=[\'a\',\'b\'])

The result looks fine:

相关标签:
2条回答
  • 2021-01-13 23:42

    I don't know how to ask numpy to determine for you some aspects of a dtype but not others, but couldn't you have, e.g.:

    data = [(1,'hello'),(2,'world')]
    dlen = max(len(s) for i, s in data)
    st = '|S%d' % dlen
    np.rec.fromrecords(data, dtype=[('a',np.int8), ('b',st)])
    
    0 讨论(0)
  • 2021-01-14 00:06

    If you don't need to manipulate the strings as bytes, you may use the object data-type to represent them. This essentially stores a pointer instead of the actual bytes:

    In [38]: np.array(data, dtype=[('a', np.uint8), ('b', np.object)])
    Out[38]: 
    array([(1, 'hello'), (2, 'world')], 
          dtype=[('a', '|u1'), ('b', '|O8')])
    

    Alternatively, Alex's idea would work well:

    new_dt = []
    
    # For each field of a given type and alignment, determine
    # whether the field is an integer.  If so, represent it as a byte.
    
    for f, (T, align) in dt.fields.iteritems():
        if np.issubdtype(T, int):
            new_dt.append((f, np.uint8))
        else:
            new_dt.append((f, T))
    
    new_dt = np.dtype(new_dt)
    np.array(data, dtype=new_dt)
    

    which should yield

    array([(1, 'hello'), (2, 'world')], 
          dtype=[('f0', '|u1'), ('f1', '|S5')])
    
    0 讨论(0)
提交回复
热议问题