Calculating a 3D gradient with unevenly spaced points

问题

I currently have a volume spanned by a few million every unevenly spaced particles and each particle has an attribute (potential, for those who are curious) that I want to calculate the local force (acceleration) for.

np.gradient only works with evenly spaced data and I looked here: Second order gradient in numpy where interpolation is necessary but I could not find a 3D spline implementation in Numpy.

Some code that will produce representative data:

import numpy as np    
from scipy.spatial import cKDTree

x = np.random.uniform(-10, 10, 10000)
y = np.random.uniform(-10, 10, 10000)
z = np.random.uniform(-10, 10, 10000)
phi = np.random.uniform(-10**9, 0, 10000)

kdtree = cKDTree(np.c_[x,y,z])
_, index = kdtree.query([0,0,0], 32) #find 32 nearest particles to the origin
#find the gradient at (0,0,0) by considering the 32 nearest particles?

(My problem is very similar to Function to compute 3D gradient with unevenly spaced sample locations but there seemed to have been no solution there, so I thought I'd ask again.)

Any help would be appreciated.

回答1:

Here is a Julia implementation that does what you ask

using NearestNeighbors

n = 3;
k = 32; # for stability use  k > n*(n+3)/2

# Take a point near the center of cube
point = 0.5 + rand(n)*1e-3;
data = rand(n, 10^4);
kdtree = KDTree(data);
idxs, dists = knn(kdtree, point, k, true);

# Coords of the k-Nearest Neighbors
X = data[:,idxs];

# Least-squares recipe for coefficients
 C = point * ones(1,k); # central node
dX = X - C;  # diffs from central node
 G = dX' * dX;
 F =  G .* G;
 v = diag(G);
 N = pinv(G) * G;
 N = eye(N) - N;
 a =  N * pinv(F*N) * v;  # ...these are the coeffs

# Use a temperature distribution of  T = 25.4 * r^2
# whose analytical gradient is   gradT = 25.4 * 2*x
X2 = X .* X;
C2 = C .* C;
T  = 25.4 * n * mean(X2, 1)';
Tc = 25.4 * n * mean(C2, 1)'; # central node
dT = T - Tc;       # diffs from central node

y = dX * (a .* dT);   # Estimated gradient
g = 2 * 25.4 * point; # Analytical

# print results
@printf "Estimated  Grad  = %s\n" string(y')
@printf "Analytical Grad  = %s\n" string(g')
@printf "Relative Error   = %.8f\n" vecnorm(g-y)/vecnorm(g)

This method has about a 1% relative error. Here are the results from a few runs...

Estimated  Grad  = [25.51670916224472 25.421038632006926 25.6711949674633]
Analytical Grad  = [25.41499027802736 25.44913042322385  25.448202594123806]
Relative Error   = 0.00559934

Estimated  Grad  = [25.310574056859014 25.549736360607493 25.368056350800604]
Analytical Grad  = [25.43200914200516  25.43243178887198  25.45061497749628]
Relative Error   = 0.00426558

Update
I don't know Python very well, but here is a translation that seems to be working

import numpy as np
from scipy.spatial import KDTree

n = 3;
k = 32;

# fill the cube with random points
data = np.random.rand(10000,n)
kdtree = KDTree(data)

# pick a point (at the center of the cube)
point = 0.5 * np.ones((1,n))

# Coords of k-Nearest Neighbors
dists, idxs = kdtree.query(point, k)
idxs = idxs[0]
X = data[idxs,:]

# Calculate coefficients
C = (np.dot(point.T, np.ones((1,k)))).T # central node
dX= X - C                    # diffs from central node
G = np.dot(dX, dX.T)
F = np.multiply(G, G)
v = np.diag(G);
N = np.dot(np.linalg.pinv(G), G)
N = np.eye(k) - N;
a = np.dot(np.dot(N, np.linalg.pinv(np.dot(F,N))), v)  # these are the coeffs

#  Temperature distribution is  T = 25.4 * r^2
X2 = np.multiply(X, X)
C2 = np.multiply(C, C)
T  = 25.4 * n * np.mean(X2, 1).T
Tc = 25.4 * n * np.mean(C2, 1).T # central node
dT = T - Tc;       # diffs from central node

# Analytical gradient ==>  gradT = 2*25.4* x
g = 2 * 25.4 * point;
print( "g[]: %s" % (g) )

# Estimated gradient
y = np.dot(dX.T, np.multiply(a, dT))
print( "y[]: %s,   Relative Error = %.8f" % (y, np.linalg.norm(g-y)/np.linalg.norm(g)) )

Update #2
I think I can write something comprehensible using formatted ASCII instead of LaTeX...

`Given a set of M vectors in n-dimensions (call them b_k), find a set of
`coeffs (call them a_k) which yields the best estimate of the identity
`matrix and the zero vector
`
`                                 M
` (1) min ||E - I||,  where  E = sum  a_k b_k b_k
`     a_k                        k=1
`
`                                 M
` (2) min ||z - 0||,  where  z = sum  a_k b_k
`     a_k                        k=1
`
`
`Note that the basis vectors {b_k} are not required
`to be normalized, orthogonal, or even linearly independent.
`
`First, define the following quantities:
`
`  B             ==> matrix whose columns are the b_k
`  G = B'.B      ==> transpose of B times B
`  F = G @ G     ==> @ represents the hadamard product
`  v = diag(G)   ==> vector composed of diag elements of G
`
`The above minimizations are equivalent to this linearly constrained problem
`
`  Solve  F.a = v
`  s.t.   G.a = 0
`
`Let {X} denote the Moore-Penrose inverse of X.
`Then the solution of the linear problem can be written:
`
`  N = I - {G}.G       ==> projector into nullspace of G
`  a = N . {F.N} . v
`
`The utility of these coeffs is that they allow you to write
`very simple expressions for the derivatives of a tensor field.
`
`
`Let D be the del (or nabla) operator
`and d be the difference operator wrt the central (aka 0th) node,
`so that, for any scalar/vector/tensor quantity Y, we have:
`  dY = Y - Y_0
`
`Let x_k be the position vector of the kth node.
`And for our basis vectors, take
`  b_k = dx_k  =  x_k - x_0.
`
`Assume that each node has a field value associated with it
` (e.g. temperature), and assume a quadratic model [about x = x_0]
` for the field [g=gradient, H=hessian, ":" is the double-dot product]
`
`     Y = Y_0 + (x-x_0).g + (x-x_0)(x-x_0):H/2
`    dY = dx.g + dxdx:H/2
`   D2Y = I:H            ==> Laplacian of Y
`
`
`Evaluate the model at the kth node 
`
`    dY_k = dx_k.g  +  dx_k dx_k:H/2
`
`Multiply by a_k and sum
`
`     M               M                  M
`    sum a_k dY_k =  sum a_k dx_k.g  +  sum a_k dx_k dx_k:H/2
`    k=1             k=1                k=1
`
`                 =  0.g   +  I:H/2
`                 =  D2Y / 2
`
`Thus, we have a second order estimate of the Laplacian
`
`                M
`   Lap(Y_0) =  sum  2 a_k dY_k
`               k=1
`
`
`Now play the same game with a linear model
`    dY_k = dx_k.g
`
`But this time multiply by (a_k dx_k) and sum
`
`     M                    M
`    sum a_k dx_k dY_k =  sum a_k dx_k dx_k.g
`    k=1                  k=1
`
`                      =  I.g
`                      =  g
`
`
`In general, the derivatives at the central node can be estimated as
`
`           M
`    D#Y = sum  a_k dx_k#dY_k
`          k=1
`
`           M
`    D2Y = sum  2 a_k dY_k
`          k=1
`
` where
`   # stands for the {dot, cross, or tensor} product
`       yielding the {div, curl,  or grad} of Y
` and
`   D2Y stands for the Laplacian of Y
`   D2Y = D.DY = Lap(Y)

回答2:

Intuitively, for the derivate wrt one datapoint, I would do the following

Take a slice of the surrounding data: data=phi[x_id-1:x_id+1, y_id-1:y_id+1, z_id-1:z_id+1]. The approach with the kdTre looks very nice, of course you can use that for a subset of the data, too.
Fit a 3D polynomial, you might want to look at polyvander3D. Define the point in the middle of the slice as the center. Calculate the offsets to the other points. Pass them as coordinates to the polyfit.
Derive the polynomial at your position.

This would be a simple solution to your problem. However it would probably be very slow.

EDIT:

In fact this seems to be the usual method: https://scicomp.stackexchange.com/questions/480/how-can-i-numerically-differentiate-an-unevenly-sampled-function

The accepted answer talks about deriving an interpolating polynomial. Although apparently that polynomial should cover all the data (Vandermonde matrix). For you that is impossible, too much data. Taking a local subset seems very reasonable.

回答3:

A lot depends on the signal/noise ratio of your potential data. Your example is all noise, so "fitting" anything to it will always be "over-fitting." The degree of noise will determine the degree to which you want to be poly-fitting (as with lhk's answer) and how much you want to be Kriging (using pyKriging or otherwise)

I'd suggest using query(x,distance_upper_bound) instead of query(x,k), as this will probably prevent some instabilities due to clustering
I'm not a mathematician, but I'd expect that fitting polynomials to a distance-dependent subset of data would be spatially unstable, especially as the polynomial order increases. This would make your resulting gradient field discontinuous.

回答4:

I will give my two cents late. In the case where space is evenly spanned and large, it is common to extract only local information for each particle.

As you may notice, there are different ways to extract local information:

N nearest neighbor, using KD tree for example. This defines locality dynamically, which may or may not be a good idea.
Randomly partition the space with planes to group particles. Basically testing for N inequality to cut the space N times.

Once locality is defined, you can interpolate a polynomial, which is differentiate analytically. I encourage more thinking in different locality definitions. (may give interesting difference)

来源：https://stackoverflow.com/questions/40338386/calculating-a-3d-gradient-with-unevenly-spaced-points

标签

python

numpy

scipy

numerical-methods

derivative