Converting a 2D numpy array to a structured array

前端 未结 5 1652
故里飘歌
故里飘歌 2020-11-29 01:08

I\'m trying to convert a two-dimensional array into a structured array with named fields. I want each row in the 2D array to be a new record in the structured array. Unfortu

相关标签:
5条回答
  • 2020-11-29 01:35

    There's a lot of confusion here between "record array" and "structured array". Here's my short solution for a structured array.

    dtype = np.dtype([("Col1","S8"),("Col2","f8"),("Col3","i8")])
    myarray = np.array([("Hello",2.5,3),("World",3.6,2)], dtype=dtype)
    np.array(np.rec.fromarrays(myarray.transpose(), names=dtype.names).astype(dtype=dtype).tolist(), dtype=dtype)
    

    So, with the assumption that dtype is defined, this is a one-liner.

    0 讨论(0)
  • 2020-11-29 01:36

    Okay, I have been struggling with this for a while now but I have found a way to do this that doesn't take too much effort. I apologise if this code is "dirty"....

    Let's start with a 2D array:

    mydata = numpy.array([['text1', 1, 'longertext1', 0.1111],
                         ['text2', 2, 'longertext2', 0.2222],
                         ['text3', 3, 'longertext3', 0.3333],
                         ['text4', 4, 'longertext4', 0.4444],
                         ['text5', 5, 'longertext5', 0.5555]])
    

    So we end up with a 2D array with 4 columns and 5 rows:

    mydata.shape
    Out[30]: (5L, 4L)
    

    To use numpy.core.records.arrays - we need to supply the input argument as a list of arrays so:

    tuple(mydata)
    Out[31]: 
    (array(['text1', '1', 'longertext1', '0.1111'], 
          dtype='|S11'),
     array(['text2', '2', 'longertext2', '0.2222'], 
          dtype='|S11'),
     array(['text3', '3', 'longertext3', '0.3333'], 
          dtype='|S11'),
     array(['text4', '4', 'longertext4', '0.4444'], 
          dtype='|S11'),
     array(['text5', '5', 'longertext5', '0.5555'], 
          dtype='|S11'))
    

    This produces a separate array per row of data BUT, we need the input arrays to be by column so what we will need is:

    tuple(mydata.transpose())
    Out[32]: 
    (array(['text1', 'text2', 'text3', 'text4', 'text5'], 
          dtype='|S11'),
     array(['1', '2', '3', '4', '5'], 
          dtype='|S11'),
     array(['longertext1', 'longertext2', 'longertext3', 'longertext4',
           'longertext5'], 
          dtype='|S11'),
     array(['0.1111', '0.2222', '0.3333', '0.4444', '0.5555'], 
          dtype='|S11'))
    

    Finally it needs to be a list of arrays, not a tuple, so we wrap the above in list() as below:

    list(tuple(mydata.transpose()))
    

    That is our data input argument sorted.... next is the dtype:

    mydtype = numpy.dtype([('My short text Column', 'S5'),
                           ('My integer Column', numpy.int16),
                           ('My long text Column', 'S11'),
                           ('My float Column', numpy.float32)])
    mydtype
    Out[37]: dtype([('My short text Column', '|S5'), ('My integer Column', '<i2'), ('My long text Column', '|S11'), ('My float Column', '<f4')])
    

    Okay, so now we can pass that to the numpy.core.records.array():

    myRecord = numpy.core.records.array(list(tuple(mydata.transpose())), dtype=mydtype)
    

    ... and fingers crossed:

    myRecord
    Out[36]: 
    rec.array([('text1', 1, 'longertext1', 0.11110000312328339),
           ('text2', 2, 'longertext2', 0.22220000624656677),
           ('text3', 3, 'longertext3', 0.33329999446868896),
           ('text4', 4, 'longertext4', 0.44440001249313354),
           ('text5', 5, 'longertext5', 0.5554999709129333)], 
          dtype=[('My short text Column', '|S5'), ('My integer Column', '<i2'), ('My long text Column', '|S11'), ('My float Column', '<f4')])
    

    Voila! You can index by column name as in:

    myRecord['My float Column']
    Out[39]: array([ 0.1111    ,  0.22220001,  0.33329999,  0.44440001,  0.55549997], dtype=float32)
    

    I hope this helps as I wasted so much time with numpy.asarray and mydata.astype etc trying to get this to work before finally working out this method.

    0 讨论(0)
  • 2020-11-29 01:43

    I guess

    new_array = np.core.records.fromrecords([("Hello",2.5,3),("World",3.6,2)],
                                            names='Col1,Col2,Col3',
                                            formats='S8,f8,i8')
    

    is what you want.

    0 讨论(0)
  • 2020-11-29 01:53

    If the data starts as a list of tuples, then creating a structured array is straight forward:

    In [228]: alist = [("Hello",2.5,3),("World",3.6,2)]
    In [229]: dt = [("Col1","S8"),("Col2","f8"),("Col3","i8")]
    In [230]: np.array(alist, dtype=dt)
    Out[230]: 
    array([(b'Hello',  2.5, 3), (b'World',  3.6, 2)], 
          dtype=[('Col1', 'S8'), ('Col2', '<f8'), ('Col3', '<i8')])
    

    The complication here is that the list of tuples has been turned into a 2d string array:

    In [231]: arr = np.array(alist)
    In [232]: arr
    Out[232]: 
    array([['Hello', '2.5', '3'],
           ['World', '3.6', '2']], 
          dtype='<U5')
    

    We could use the well known zip* approach to 'transposing' this array - actually we want a double transpose:

    In [234]: list(zip(*arr.T))
    Out[234]: [('Hello', '2.5', '3'), ('World', '3.6', '2')]
    

    zip has conveniently given us a list of tuples. Now we can recreate the array with desired dtype:

    In [235]: np.array(_, dtype=dt)
    Out[235]: 
    array([(b'Hello',  2.5, 3), (b'World',  3.6, 2)], 
          dtype=[('Col1', 'S8'), ('Col2', '<f8'), ('Col3', '<i8')])
    

    The accepted answer uses fromarrays:

    In [236]: np.rec.fromarrays(arr.T, dtype=dt)
    Out[236]: 
    rec.array([(b'Hello',  2.5, 3), (b'World',  3.6, 2)], 
              dtype=[('Col1', 'S8'), ('Col2', '<f8'), ('Col3', '<i8')])
    

    Internally, fromarrays takes a common recfunctions approach: create target array, and copy values by field name. Effectively it does:

    In [237]: newarr = np.empty(arr.shape[0], dtype=dt)
    In [238]: for n, v in zip(newarr.dtype.names, arr.T):
         ...:     newarr[n] = v
         ...:     
    In [239]: newarr
    Out[239]: 
    array([(b'Hello',  2.5, 3), (b'World',  3.6, 2)], 
          dtype=[('Col1', 'S8'), ('Col2', '<f8'), ('Col3', '<i8')])
    
    0 讨论(0)
  • 2020-11-29 01:54

    You can "create a record array from a (flat) list of arrays" using numpy.core.records.fromarrays as follows:

    >>> import numpy as np
    >>> myarray = np.array([("Hello",2.5,3),("World",3.6,2)])
    >>> print myarray
    [['Hello' '2.5' '3']
     ['World' '3.6' '2']]
    
    
    >>> newrecarray = np.core.records.fromarrays(myarray.transpose(), 
                                                 names='col1, col2, col3',
                                                 formats = 'S8, f8, i8')
    
    >>> print newrecarray
    [('Hello', 2.5, 3) ('World', 3.5999999046325684, 2)]
    

    I was trying to do something similar. I found that when numpy created a structured array from an existing 2D array (using np.core.records.fromarrays), it considered each column (instead of each row) in the 2-D array as a record. So you have to transpose it. This behavior of numpy does not seem very intuitive, but perhaps there is a good reason for it.

    0 讨论(0)
提交回复
热议问题