python - create a pivot table

后端 未结 1 858
耶瑟儿~
耶瑟儿~ 2021-01-01 03:46

I\'m trying to create a pivot table from a Numpy array in python. I\'ve done a lot of research but I cannot find a straight forward solution. I know you can do it with Pan

相关标签:
1条回答
  • 2021-01-01 04:26

    I think this is what you want:

    data = np.array([[ 4057,     8,  1374],
                     [ 4057,     9,   759],
                     [ 4057,    11,    96],
                     [89205,    16,   146],
                     [89205,    17,   154],
                     [89205,    18,   244]])
    
    rows, row_pos = np.unique(data[:, 0], return_inverse=True)
    cols, col_pos = np.unique(data[:, 1], return_inverse=True)
    
    pivot_table = np.zeros((len(rows), len(cols)), dtype=data.dtype)
    pivot_table[row_pos, col_pos] = data[:, 2]
    
    >>> pivot_table
    array([[1374,  759,   96,    0,    0,    0],
           [   0,    0,    0,  146,  154,  244]])
    >>> rows
    array([ 4057, 89205])
    >>> cols
    array([ 8,  9, 11, 16, 17, 18])
    

    There are some limitations to this approach, the main being that, if you have repeated entries for a same row/column combination, they will not be added together, but only one (possibly the last) will be kept. If you want to add them all together, although a little convoluted, you could abuse scipy's sparse module:

    data = np.array([[ 4057,     8,  1374],
                     [ 4057,     9,   759],
                     [ 4057,    11,    96],
                     [89205,    16,   146],
                     [89205,    17,   154],
                     [89205,    18,   244],
                     [ 4057,    11,     4]])
    
    rows, row_pos = np.unique(data[:, 0], return_inverse=True)
    cols, col_pos = np.unique(data[:, 1], return_inverse=True)
    
    pivot_table = np.zeros((len(rows), len(cols)), dtype=data.dtype)
    pivot_table[row_pos, col_pos] = data[:, 2]
    >>> pivot_table # the element at [0, 2] should be 100!!!
    array([[1374,  759,    4,    0,    0,    0],
           [   0,    0,    0,  146,  154,  244]])
    
    import scipy.sparse as sps
    pivot_table = sps.coo_matrix((data[:, 2], (row_pos, col_pos)),
                                 shape=(len(rows), len(cols))).A
    >>> pivot_table # now repeated elements are added together
    array([[1374,  759,  100,    0,    0,    0],
           [   0,    0,    0,  146,  154,  244]])
    
    0 讨论(0)
提交回复
热议问题