I have a catalogue of data and I want to use it in my MCMC code. What is crucial is the speed of implementation, in order to avoid slowing down my Markov chain monte carlo sampling.
The problem:
In the catalogue, I have in the first and second column two parameters called ra
and dec
which are sky coordinates:
data=np.loadtxt('Final.Cluster.Shear.NegligibleShotNoise.Redshift.cat')
ra=data[:,0]
dec=data[:,1]
then in the seven and eight columns X
and Y
positions, i.e. the grid coordinates, they are points in a grid space
Xpos=data[:,6]
Ypos=data[:,7]
In the function that I have written and it is needed to be called like a million time,
I will give one Xcenter
and Ycenter
positions (for example Xcenter=200.6, Ycenter=310.9) as inputs to the function and I want to find the correspondence points in the ra
and dec
columns. However it might happen that the inputs do not have any real correspondence in the ra
and dec
. So I want to do an interpolation in case there is no similar entries for X
and Y
and ra
and dec
data in the catalogue and obtain the interpolated coordinates based on real ra
and dec
entries in the catalogue.
This is a perfect case where the scipy.spatial.cKDTree()
class can be used to query all the points at once:
from scipy.spatial import cKDTree
k = cKDTree(data[:, 6:8]) # creating the KDtree using the Xpos and Ypos
xyCenters = np.array([[200.6, 310.9],
[300, 300],
[400, 400]])
print(k.query(xyCenters))
# (array([ 1.59740195, 1.56033234, 0.56352196]),
# array([ 2662, 22789, 5932]))
where [ 2662, 22789, 5932]
are the indices corresponding to the three closest points given in xyCenters
. You can use these indices to get your ra
and dec
values very efficiently using np.take()
:
dists, indices = k.query(xyCenters)
myra = np.take(ra, indices)
mydec = np.take(dec, indices)
来源:https://stackoverflow.com/questions/25550813/finding-the-correspondence-of-data-from-one-data-set-in-the-other