numpy, named columns

前端 未结 2 1177
天涯浪人
天涯浪人 2020-12-31 02:06

Simple question about numpy:

I load 100 values to a vector a. From this vector, I want to create an array A with 2 columns, where

相关标签:
2条回答
  • 2020-12-31 02:23

    I know this is an old question, but a more recently available option would be to try using pandas. The DataFrame type is designed for structured data like this, where columns are named and can be of different types.

    0 讨论(0)
  • 2020-12-31 02:45

    NumPy structured arrays have named columns:

    import numpy as np
        
    a = range(100)
    A = np.array(list(zip(*[iter(a)] * 2)), dtype=[('C1', 'int32'),('C2', 'int64')])
    print(A.dtype)
    
    [('C1', '<i4'), ('C2', '<i8')]
    

    You can access the columns by name like this:

    print(A['C1'])
    # [ 0  2  4  6  8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48
    #  50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98]
    

    Note that using np.array with zip causes NumPy to build an array from a temporary list of tuples. Python lists of tuples use a lot more memory than equivalent NumPy arrays. So if your array is very large you may not want to use zip.

    Instead, given a NumPy array A, you could use ravel() to make A a 1D array, and then use view to turn it into a structured array, and then use astype to convert the columns to the desired type:

    a = range(100)
    A = np.array(a).reshape( len(a)//2, 2)
    A = A.ravel().view([('col1','i8'),('col2','i8'),]).astype([('col1','i4'),('col2','i8'),])
    print(A[:5])
    # array([(0, 1), (2, 3), (4, 5), (6, 7), (8, 9)], 
    #       dtype=[('col1', '<i4'), ('col2', '<i8')])
    
    print(A.dtype)
    # dtype([('col1', '<i4'), ('col2', '<i8')])
    
    0 讨论(0)
提交回复
热议问题