Reorganizing an MxN 2D array of datapoints into an N-dimensional array

后端 未结 1 1873
野性不改
野性不改 2021-01-16 06:43

I\'ve got a series of measurements in a 2D array such as

T    mu1  mu2  mu3  a    b    c    d    e
0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
0.0  0.0  0.0          


        
相关标签:
1条回答
  • 2021-01-16 07:13

    First, let's make some fake data:

    # an N x 5 array containing a regular mesh representing the stimulus params
    stim_params = np.mgrid[:2, :3, :4, :5, :6].reshape(5, -1).T
    
    # an N x 3 array representing the output values for each simulation run
    output_vals = np.arange(720 * 3).reshape(720, 3)
    
    # shuffle the rows for a bit of added realism
    shuf = np.random.permutation(stim_params.shape[0])
    stim_params = stim_params[shuf]
    output_vals = output_vals[shuf]
    

    Now you can use np.lexsort to get the set of indices that will sort the rows of your 2D array of simulation parameters such that the values in each column are in ascending order. Having done that, you can apply these indices to the rows of simulation output values.

    # get the number of unique values for each stimulus parameter
    params_shape = tuple(np.unique(col).shape[0] for col in stim_params.T)
    
    # get the set of row indices that will sort the stimulus parameters in ascending
    # order, starting with the final column
    idx = np.lexsort(stim_params[:, ::-1].T)
    
    # sort and reshape the stimulus parameters:
    sorted_params = stim_params[idx].T.reshape((5,) + params_shape)
    
    # sort and reshape the output values
    sorted_output = output_vals[idx].T.reshape((3,) + params_shape)
    

    I find that the hardest part is often just trying to wrap your head around what all the different dimensions of the outputs correspond to:

    # array of stimulus parameters, with dimensions (n_params, p1, p2, p3, p4, p5)
    print(sorted_params.shape)
    # (5, 2, 3, 4, 5, 6)
    
    # to check that the sorting worked as expected, we can look at the values of the 
    # 5th parameter when all the others are held constant at 0:
    print(sorted_params[4, 0, 0, 0, 0, :])
    # [0 1 2 3 4 5]
    
    # ... and the 1st parameter when we hold all the others constant:
    print(sorted_params[0, :, 0, 0, 0, 0])
    # [0, 1]
    
    # ... now let the 1st and 2nd parameters covary:
    print(sorted_params[:2, :, :, 0, 0, 0])
    # [[[0 0 0]
    #   [1 1 1]]
    
    #  [[0 1 2]
    #   [0 1 2]]]
    

    Hopefully you get the idea. The same indexing logic applies to the sorted simulation outputs:

    # array of outputs, with dimensions (n_outputs, p1, p2, p3, p4, p5)
    print(sorted_output.shape)
    # (3, 2, 3, 4, 5, 6)
    
    # the first output variable whilst holding the first 4 simulation parameters
    # constant at 0:
    print(sorted_output[0, 0, 0, 0, 0, :])
    # [ 0  3  6  9 12 15]
    
    0 讨论(0)
提交回复
热议问题