Python numpy: create 2d array of values based on coordinates

前端 未结 5 799
太阳男子
太阳男子 2020-12-24 15:34

I have a file containing 3 columns, where the first two are coordinates (x,y) and the third is a value (z) corresponding to that position. Here\'s a short example:



        
相关标签:
5条回答
  • 2020-12-24 15:57

    If you have scipy installed, you could take advantage of its sparse matrix module. Get the values from the text file with genfromtxt, and plug those 'columns' directly into a sparse matrix creator.

    In [545]: txt=b"""x y z
    0 1 14
    0 2 17
    1 0 15
    1 1 16
    2 1 18
    2 2 13
    """
    
    In [546]: xyz=np.genfromtxt(txt.splitlines(),names=True,dtype=int)
    
    In [547]: sparse.coo_matrix((xyz['z'],(xyz['y'],xyz['x']))).A     
    Out[547]: 
    array([[ 0, 15,  0],
           [14, 16, 18],
           [17,  0, 13]])
    

    But Joe's z_array=np.zeros((3,3),int); z_array[xyz['y'],xyz['x']]=xyz['z'] is considerably faster.

    0 讨论(0)
  • 2020-12-24 15:57

    Nice answers by others. Thought this might be a useful snippet for someone else who might need this.

    def make_grid(x, y, z):
        '''
        Takes x, y, z values as lists and returns a 2D numpy array
        '''
        dx = abs(np.sort(list(set(x)))[1] - np.sort(list(set(x)))[0])
        dy = abs(np.sort(list(set(y)))[1] - np.sort(list(set(y)))[0])
        i = ((x - min(x)) / dx).astype(int) # Longitudes
        j = ((y - max(y)) / dy).astype(int) # Latitudes
        grid = np.nan * np.empty((len(set(j)),len(set(i))))
        grid[-j, i] = z # if using latitude and longitude (for WGS/West)
        return grid
    
    0 讨论(0)
  • 2020-12-24 15:59

    Assuming the x and y values in your file directly correspond to indices (as they do in your example), you can do something similar to this:

    import numpy as np
    
    x = [0, 0, 1, 1, 2, 2]
    y = [1, 2, 0, 1, 1, 2]
    z = [14, 17, 15, 16, 18, 13]
    
    z_array = np.nan * np.empty((3,3))
    z_array[y, x] = z
    
    print z_array
    

    Which yields:

    [[ nan  15.  nan]
     [ 14.  16.  18.]
     [ 17.  nan  13.]]
    

    For large arrays, this will be much faster than the explicit loop over the coordinates.


    Dealing with non-uniform x & y input

    If you have regularly sampled x & y points, then you can convert them to grid indices by subtracting the "corner" of your grid (i.e. x0 and y0), dividing by the cell spacing, and casting as ints. You can then use the method above or in any of the other answers.

    As a general example:

    i = ((y - y0) / dy).astype(int)
    j = ((x - x0) / dx).astype(int)
    
    grid[i,j] = z
    

    However, there are a couple of tricks you can use if your data is not regularly spaced.

    Let's say that we have the following data:

    import numpy as np
    import matplotlib.pyplot as plt
    
    np.random.seed(1977)
    x, y, z = np.random.random((3, 10))
    
    fig, ax = plt.subplots()
    scat = ax.scatter(x, y, c=z, s=200)
    fig.colorbar(scat)
    ax.margins(0.05)
    

    enter image description here

    That we want to put into a regular 10x10 grid:

    enter image description here

    We can actually use/abuse np.histogram2d for this. Instead of counts, we'll have it add the value of each point that falls into a cell. It's easiest to do this through specifying weights=z, normed=False.

    import numpy as np
    import matplotlib.pyplot as plt
    
    np.random.seed(1977)
    x, y, z = np.random.random((3, 10))
    
    # Bin the data onto a 10x10 grid
    # Have to reverse x & y due to row-first indexing
    zi, yi, xi = np.histogram2d(y, x, bins=(10,10), weights=z, normed=False)
    zi = np.ma.masked_equal(zi, 0)
    
    fig, ax = plt.subplots()
    ax.pcolormesh(xi, yi, zi, edgecolors='black')
    scat = ax.scatter(x, y, c=z, s=200)
    fig.colorbar(scat)
    ax.margins(0.05)
    
    plt.show()
    

    enter image description here

    However, if we have a large number of points, some bins will have more than one point. The weights argument to np.histogram simply adds the values. That's probably not what you want in this case. Nonetheless, we can get the mean of the points that fall in each cell by dividing by the counts.

    So, for example, let's say we have 50 points:

    import numpy as np
    import matplotlib.pyplot as plt
    
    np.random.seed(1977)
    x, y, z = np.random.random((3, 50))
    
    # Bin the data onto a 10x10 grid
    # Have to reverse x & y due to row-first indexing
    zi, yi, xi = np.histogram2d(y, x, bins=(10,10), weights=z, normed=False)
    counts, _, _ = np.histogram2d(y, x, bins=(10,10))
    
    zi = zi / counts
    zi = np.ma.masked_invalid(zi)
    
    fig, ax = plt.subplots()
    ax.pcolormesh(xi, yi, zi, edgecolors='black')
    scat = ax.scatter(x, y, c=z, s=200)
    fig.colorbar(scat)
    ax.margins(0.05)
    
    plt.show()
    

    enter image description here

    With very large numbers of points, this exact method will become slow (and can be sped up easily), but it's sufficient for anything less than ~1e6 points.

    0 讨论(0)
  • 2020-12-24 16:07

    You could try something like:

    import numpy as np
    
    x = [0, 0, 1, 1, 2, 2]
    y = [1, 2, 0, 1, 1, 2]
    z = [14, 17, 15, 16, 18, 13]
    
    arr = np.zeros((3,3))
    yx = zip(y,x)
    
    for i, coord in enumerate(yx):
        arr[coord] = z[i]
    
    print arr
    >>> [[  0.  15.   0.]
         [ 14.  16.  18.]
         [ 17.   0.  13.]]
    
    0 讨论(0)
  • 2020-12-24 16:15

    Kezzos beat me to it but I had a similar approach,

    x = np.array([0,0,1,1,2,2])
    y = np.array([1,2,0,1,1,2])
    z = np.array([14,17,15,16,18,13])
    Z = np.zeros((3,3))
    for i,j in enumerate(zip(x,y)): 
        Z[j] = z[i]
    
    Z[np.where(Z==0)] = np.nan
    
    0 讨论(0)
提交回复
热议问题