astropy.io fits efficient element access of a large table

问题

I am trying to extract data from a binary table in a FITS file using Python and astropy.io. The table contains an events array with over 2 million events. What I want to do is store the TIME values of certain events in an array, so I can then do analysis on that array. The problem I have is that, whereas in fortran (using FITSIO) the same operation takes maybe a couple of seconds on a much slower processor, the exact same operation in Python using astropy.io is taking several minutes. I would like to know where exactly the bottleneck is, and if there is a more efficient way to access the individual elements in order to determine whether or not to store each time value in the new array. Here is the code I have so far:

from astropy.io import fits

minenergy=0.3
maxenergy=0.4
xcen=20000
ycen=20000
radius=50

datafile=fits.open('datafile.fits')
events=datafile['EVENTS'].data


datafile.close()

times=[]

for i in range(len(events)):
    energy=events['PI'][i]
    if energy<maxenergy*1000:
        if energy>minenergy*1000:
            x=events['X'][i]
            y=events['Y'][i]
            radius2=(x-xcen)*(x-xcen)+(y-ycen)*(y-ycen)
            if radius2<=radius*radius:
                times.append(events['TIME'][i])

print times

Any help would be appreciated. I am an ok programmer in other languages, but I have not really had to worry about efficiency in Python before. The reason I have chosen to do this in Python now is that I was using fortran with both FITSIO and PGPLOT, as well as some routines from Numerical Recipes, but the newish fortran compiler I have on this machine cannot be persuaded to produce a properly working program (there are some issues of 32- vs. 64-bit, etc.). Python seems to have all the functionality I need (FITS I/O, plotting, etc), but if it takes forever to access the individual elements in a list, I will have to find another solution.

Thanks very much.

回答1:

You need to do this using numpy vector operations. Without special tools like numba, doing large loops like you've done will always be slow in Python because it is an interpreted language. Your program should look more like:

energy = events['PI'] / 1000.
e_ok = (energy > min_energy) & (energy < max_energy)
rad2 = (events['X'][e_ok] - xcen)**2 + (events['Y'][e_ok] - ycen)**2
r_ok = rad2 < radius**2
times = events['TIMES'][e_ok][r_ok]

This should have performance comparable to Fortran. You can also filter the entire event table, for instance:

events_filt = events[e_ok][r_ok]
times = events_filt['TIMES']

来源：https://stackoverflow.com/questions/31315325/astropy-io-fits-efficient-element-access-of-a-large-table

标签

python

arrays

fits

astropy