问题
Let say we have TRMM precipitation data, each file represents data for each month. For example, the files in the folder are:
3B42.1998.01.01.7A.nc,
3B42.1998.02.01.7A.nc,
3B42.1998.03.01.7A.nc,
3B42.1998.04.01.7A.nc,
3B42.1998.05.01.7A.nc,
......
......
3B42.2010.11.01.7A.nc,
3B42.2010.12.01.7A.nc.
These files having a dimension as follows : Xsize=1440, Ysize=400, Zsize=1,Tsize=1. Longitude set to 0 to 360 and Latitude set to -50 to 50.
I want to calculate the amount of precipitation over a certain region, let say in between lon=98.5, lon=100 and lat=4, lat=6.5
. This means, to read the variables only in this region -:
--------------------
|lon:98.5 lat:6.5|
| |
|lat:4 lon:100 |
---------------------
I used to do this in GrADS (Grid Analysis and Display System). In GrADS, this can be done like: (simplified version)
yy=1998
while yr < 2011
'sdfopen f:\data\trmm\3B42.'yy'.12.01.7A.nc'
'd aave(pcp,lon=98.5,lon=100.0,lat=4.0,lat=6.5)'
res=subwrd(result,4)
rec=write('d:\precip.sp.TRMM3B42.1.'yy'.csv',res,append)
yy = yy+1
endwhile
I tried to do the same thing in Python,but something went wrong. After a few suggestions,here I am now:
import csv
import netCDF4 as nc
import numpy as np
#calculating december only
f = nc.MFDataset('d:/data/trmm/3B43.????.12.01.7A.nc')#maybe I shouldn't do MFDataset?
pcpt = f.variables['pcp']
lon = f.variables['longitude']
lat = f.variables['latitude']
# Determine which longitudes
latidx1 = (lat >=4.0 ) & (lat <=6.5 )
lonidx1 = (lon >=98.5 ) & (lon <=100.0 )
rainf1 = pcpt[:]
rainf1 = rainf1[:, latidx1][..., lonidx1]
rainf_1 = rainf1
with open('d:/trmmtest.csv', 'wb') as fp:
a = csv.writer(fp)
for i in rainf_1:
a.writerow([i])
This script produces a list for (in my case) 15 values in the CSV file. But when I try to get the values for another region, and adjust which I think necessary,let say:
latidx2 = (lat >=1.0 ) & (lat <=1.5 )
lonidx2 = (lon >=102.75 ) & (lon <=103.25 )
rainf2 = pcpt[:]
rainf2 = rainf2[:, latidx2][..., lonidx2]
rainf_2 = rainf2
I get the same values as the first one.
firstarea=[0.511935,1.0771,0.613548,1.48839,0.445161,1.39161,1.03548,0.452903, 3.07725,2.84613 0.701613,2.10581,2.47839,3.84097,2.41065,1.38387]
secondarea=[0.511935,1.0771,0.613548,1.48839,0.445161,1.39161,1.03548,0.452903, 3.07725,2.84613,0.701613,2.10581,2.47839,3.84097,2.41065,1.38387]
I did test on separate scripts, it still give me the same values. I did check in the map (constructed earlier), the values are different on those 2 regions (for December average).
Any idea why? Is there any other elegant way writing this? Thx.
回答1:
After awhile, I managed to look at this problem again, and apparently the method above is almost correct. After a few adjustment, tested on a single data file, and cross-checked with GrADS solution, I got something like this:
f = nc.Dataset('~/data/TRMM3H/3B42.19980101.12.7A.nc')
pcpt = f.variables['pcp'][:]
lon = f.variables['longitude'][:]
lat = f.variables['latitude'][:]
#select two regions
latidx1 = (lat >=4. ) & (lat <=6.5 )
lonidx1 = (lon >=100.5 ) & (lon <=101.5 )
latidx2 = (lat >=2.5 ) & (lat <=5.0 )
lonidx2 = (lon >=101. ) & (lon <=102. )
rainf = pcpt[:]
#these basically listing the values in an array (2 in this case)
rainf1 = rainf[:, latidx1][..., lonidx1]
rainf2 = rainf[:, latidx2][..., lonidx2]
rainf_1 = rainf1
rainf_2 = rainf2
#time to get the mean values
print np.mean(rainf_1)
print "............."
print np.mean(rainf_2)
print "............."
this gave me these results:
>>> execfile('find_percentile.py')
0.7830327034
.............
1.56235361099
.............
The results are the same when calculated with GrADS.
Edited after suggestion:
f = nc.Dataset('~/data/TRMM3H/3B42.19980101.12.7A.nc')
pcpt = f.variables['pcp'][:]
lon = f.variables['longitude'][:]
lat = f.variables['latitude'][:]
#select two regions
latidx1 = (lat >=4. ) & (lat <=6.5 )
lonidx1 = (lon >=100.5 ) & (lon <=101.5 )
latidx2 = (lat >=2.5 ) & (lat <=5.0 )
lonidx2 = (lon >=101. ) & (lon <=102. )
#these basically listing the values in an array (2 in this case)
rainf1 = pcpt[:, latidx1][..., lonidx1]
rainf2 = pcpt[:, latidx2][..., lonidx2]
rainf_1 = rainf1
rainf_2 = rainf2
#time to get the mean values
print np.mean(rainf_1)
print "............."
print np.mean(rainf_2)
print "............."
Back to the original question, doing this in multiple files and printing it in txt/csv file is still under construction (and test).
回答2:
I just want to point out that the solution of Fir Nor is incorrect, as you can't simply use the arithmetic mean (np.mean) when working on spatial data on a regular lat/lon grid as is the case here since the grid cell size changes as you move towards the poles!
Far better not to worry about this and do the operation with CDO:
cdo fldmean -sellonlatbox,98.5,100,4.5,6 3B42.1998.05.01.7A.nc boxav.nc
来源:https://stackoverflow.com/questions/22427954/calculate-variables-mean-in-a-selective-area-in-gridded-netcdf-file