Calculate variables mean in a selective area , in gridded netCDF file

核能气质少年 提交于 2019-12-23 04:04:13

问题


Let say we have TRMM precipitation data, each file represents data for each month. For example, the files in the folder are:

     3B42.1998.01.01.7A.nc,
     3B42.1998.02.01.7A.nc, 
     3B42.1998.03.01.7A.nc, 
     3B42.1998.04.01.7A.nc, 
     3B42.1998.05.01.7A.nc, 
     ......
     ......
     3B42.2010.11.01.7A.nc,         
     3B42.2010.12.01.7A.nc.

These files having a dimension as follows : Xsize=1440, Ysize=400, Zsize=1,Tsize=1. Longitude set to 0 to 360 and Latitude set to -50 to 50. I want to calculate the amount of precipitation over a certain region, let say in between lon=98.5, lon=100 and lat=4, lat=6.5. This means, to read the variables only in this region -:

-------------------- |lon:98.5 lat:6.5| | | |lat:4 lon:100 | ---------------------

I used to do this in GrADS (Grid Analysis and Display System). In GrADS, this can be done like: (simplified version)

      yy=1998
      while yr < 2011
        'sdfopen f:\data\trmm\3B42.'yy'.12.01.7A.nc'
        'd aave(pcp,lon=98.5,lon=100.0,lat=4.0,lat=6.5)'
         res=subwrd(result,4)
         rec=write('d:\precip.sp.TRMM3B42.1.'yy'.csv',res,append)   
         yy = yy+1
      endwhile

I tried to do the same thing in Python,but something went wrong. After a few suggestions,here I am now:

     import csv
     import netCDF4 as nc 
     import numpy as np

     #calculating december only
     f = nc.MFDataset('d:/data/trmm/3B43.????.12.01.7A.nc')#maybe I shouldn't do MFDataset?
     pcpt = f.variables['pcp']
     lon = f.variables['longitude']
     lat = f.variables['latitude']
     # Determine which longitudes
     latidx1 = (lat >=4.0 ) & (lat <=6.5 ) 
     lonidx1 = (lon >=98.5 ) & (lon <=100.0 ) 

     rainf1 = pcpt[:]
     rainf1 = rainf1[:, latidx1][..., lonidx1]
     rainf_1 = rainf1

     with open('d:/trmmtest.csv', 'wb') as fp:
          a = csv.writer(fp)
          for i in rainf_1:
              a.writerow([i])

This script produces a list for (in my case) 15 values in the CSV file. But when I try to get the values for another region, and adjust which I think necessary,let say:

     latidx2 = (lat >=1.0 ) & (lat <=1.5 ) 
     lonidx2 = (lon >=102.75 ) & (lon <=103.25 ) 

     rainf2 = pcpt[:]
     rainf2 = rainf2[:, latidx2][..., lonidx2]
     rainf_2 = rainf2

I get the same values as the first one.

firstarea=[0.511935,1.0771,0.613548,1.48839,0.445161,1.39161,1.03548,0.452903, 3.07725,2.84613 0.701613,2.10581,2.47839,3.84097,2.41065,1.38387]

secondarea=[0.511935,1.0771,0.613548,1.48839,0.445161,1.39161,1.03548,0.452903, 3.07725,2.84613,0.701613,2.10581,2.47839,3.84097,2.41065,1.38387]

I did test on separate scripts, it still give me the same values. I did check in the map (constructed earlier), the values are different on those 2 regions (for December average).

Any idea why? Is there any other elegant way writing this? Thx.


回答1:


After awhile, I managed to look at this problem again, and apparently the method above is almost correct. After a few adjustment, tested on a single data file, and cross-checked with GrADS solution, I got something like this:

    f = nc.Dataset('~/data/TRMM3H/3B42.19980101.12.7A.nc')
    pcpt = f.variables['pcp'][:]
    lon = f.variables['longitude'][:]
    lat = f.variables['latitude'][:]

    #select two regions
    latidx1 = (lat >=4. ) & (lat <=6.5 ) 
    lonidx1 = (lon >=100.5 ) & (lon <=101.5 ) 
    latidx2 = (lat >=2.5 ) & (lat <=5.0 ) 
    lonidx2 = (lon >=101. ) & (lon <=102. ) 

    rainf = pcpt[:]
    #these basically listing the values in an array (2 in this case)
    rainf1 = rainf[:, latidx1][..., lonidx1]
    rainf2 = rainf[:, latidx2][..., lonidx2]
    rainf_1 = rainf1
    rainf_2 = rainf2

    #time to get the mean values
    print np.mean(rainf_1)
    print "............."
    print np.mean(rainf_2)
    print "............."

this gave me these results:

    >>> execfile('find_percentile.py')
    0.7830327034
    .............
    1.56235361099
    .............

The results are the same when calculated with GrADS.

Edited after suggestion:

    f = nc.Dataset('~/data/TRMM3H/3B42.19980101.12.7A.nc')
    pcpt = f.variables['pcp'][:]
    lon = f.variables['longitude'][:]
    lat = f.variables['latitude'][:]

    #select two regions
    latidx1 = (lat >=4. ) & (lat <=6.5 ) 
    lonidx1 = (lon >=100.5 ) & (lon <=101.5 ) 
    latidx2 = (lat >=2.5 ) & (lat <=5.0 ) 
    lonidx2 = (lon >=101. ) & (lon <=102. ) 

    #these basically listing the values in an array (2 in this case)
    rainf1 = pcpt[:, latidx1][..., lonidx1]
    rainf2 = pcpt[:, latidx2][..., lonidx2]
    rainf_1 = rainf1
    rainf_2 = rainf2

    #time to get the mean values
    print np.mean(rainf_1)
    print "............."
    print np.mean(rainf_2)
    print "............."

Back to the original question, doing this in multiple files and printing it in txt/csv file is still under construction (and test).




回答2:


I just want to point out that the solution of Fir Nor is incorrect, as you can't simply use the arithmetic mean (np.mean) when working on spatial data on a regular lat/lon grid as is the case here since the grid cell size changes as you move towards the poles!

Far better not to worry about this and do the operation with CDO:

cdo fldmean -sellonlatbox,98.5,100,4.5,6 3B42.1998.05.01.7A.nc boxav.nc


来源:https://stackoverflow.com/questions/22427954/calculate-variables-mean-in-a-selective-area-in-gridded-netcdf-file

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!