可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I am trying to convert a .csv file to a netCDF4 via Python but I am having trouble figuring out how I can store information from a .csv table format into a netCDF. My main concern is how do we declare the variables from the columns into a workable netCDF4 format? Everything I have found is normally extracting information from a netCDF4 to a .csv or ASCII. I have provided the sample data, sample code, and my errors for declaring the appropriate arrays. Any help would be much appreciated.

The sample table is below:

Station Name    Country  Code   Lat Lon mn.yr   temp1   temp2   temp3   hpa  Somewhere   US  12340   35.52   23.358  1.19    -8.3    -13.1   -5  69.5 Somewhere   US  12340           2.1971  -10.7   -13.9   -7.9    27.9 Somewhere   US  12340           3.1971  -8.4    -13 -4.3    90.8

My sample code is:

#!/usr/bin/env python

import scipy import numpy import netCDF4 import csv  from numpy import arange, dtype

#Declare empty arrays

v1 = [] v2 = [] v3 = [] v4 = []

# Open csv file and declare variable for arrays for each heading

f = open('station_data.csv', 'r').readlines()  for line in f[1:]:     fields = line.split(',')     v1.append(fields[0]) #station     v2.append(fields[1])#country     v3.append(int(fields[2]))#code     v4.append(float(fields[3]))#lat     v5.append(float(fields[3]))#lon #more variables included but this is just an abridged list print v1 print v2 print v3 print v4

#convert to netcdf4 framework that works as a netcdf

ncout = netCDF4.Dataset('station_data.nc','w')

# latitudes and longitudes. Include NaN for missing numbers

lats_out = -25.0 + 5.0*arange(v4,dtype='float32') lons_out = -125.0 + 5.0*arange(v5,dtype='float32')

# output data.

press_out = 900. + arange(v4*v5,dtype='float32') # 1d array press_out.shape = (v4,v5) # reshape to 2d array temp_out = 9. + 0.25*arange(v4*v5,dtype='float32') # 1d array temp_out.shape = (v4,v5) # reshape to 2d array

# create the lat and lon dimensions.

ncout.createDimension('latitude',v4) ncout.createDimension('longitude',v5)

# Define the coordinate variables. They will hold the coordinate information

lats = ncout.createVariable('latitude',dtype('float32').char,('latitude',)) lons = ncout.createVariable('longitude',dtype('float32').char,('longitude',))

# Assign units attributes to coordinate var data. This attaches a text attribute to each of the coordinate variables, containing the units.

lats.units = 'degrees_north' lons.units = 'degrees_east'

# write data to coordinate vars.

lats[:] = lats_out lons[:] = lons_out

# create the pressure and temperature variables

press = ncout.createVariable('pressure',dtype('float32').char,('latitude','longitude')) temp = ncout.createVariable('temperature',dtype('float32').char,'latitude','longitude'))

# set the units attribute.

press.units =  'hPa' temp.units = 'celsius'

# write data to variables.

press[:] = press_out temp[:] = temp_out  ncout.close() f.close()

error:

Traceback (most recent call last):   File "station_data.py", line 33, in <module>     v4.append(float(fields[3]))#lat ValueError: could not convert string to float:

回答1:

If you see your input file, there is no value corresponding to column Lat in second row. When you read the csv file this value i.e. fields[3] is stored as an empty string "". That's why you are getting a ValueError. Instead of using the default function you can define a new function which can handle this error:

def str_to_float(str):     try:         number = float(str)     except ValueError:         number = 0.0 # you can assign an appropriate value instead of 0.0 which suits your requirement     return number

Now you can use this function in place of built-in float function this way:

v4.append(str_to_float(fields[3]))

回答2:

This is a perfect job for xarray, a python package that has a dataset object representing the netcdf common data model. Here's an example you can try:

import pandas as pd import xarray as xr  url = 'http://www.cpc.ncep.noaa.gov/products/precip/CWlink/'  ao_file = url + 'daily_ao_index/monthly.ao.index.b50.current.ascii' nao_file = url + 'pna/norm.nao.monthly.b5001.current.ascii'  kw = dict(sep='\s*', parse_dates={'dates': [0, 1]},           header=None, index_col=0, squeeze=True, engine='python')  # read into Pandas Series s1 = pd.read_csv(ao_file, **kw) s2 = pd.read_csv(nao_file, **kw)  s1.name='AO' s2.name='NAO'  # concatenate two Pandas Series into a Pandas DataFrame df=pd.concat([s1, s2], axis=1)  # create xarray Dataset from Pandas DataFrame xds = xr.array.Dataset.from_dataframe(df)  # add variable attribute metadata xds['AO'].attrs={'units':'1', 'long_name':'Arctic Oscillation'} xds['NAO'].attrs={'units':'1', 'long_name':'North Atlantic Oscillation'}  # add global attribute metadata xds.attrs={'Conventions':'CF-1.0', 'title':'AO and NAO', 'summary':'Arctic and North Atlantic Oscillation Indices'}  # save to netCDF xds.to_netcdf('/usgs/data2/notebook/data/ao_and_nao.nc')

Then running ncdump -h ao_and_nao.nc produces:

netcdf ao_and_nao { dimensions:         dates = 782 ; variables:         double dates(dates) ;                 dates:units = "days since 1950-01-06 00:00:00" ;                 dates:calendar = "proleptic_gregorian" ;         double NAO(dates) ;                 NAO:units = "1" ;                 NAO:long_name = "North Atlantic Oscillation" ;         double AO(dates) ;                 AO:units = "1" ;                 AO:long_name = "Arctic Oscillation" ;  // global attributes:                 :title = "AO and NAO" ;                 :summary = "Arctic and North Atlantic Oscillation Indices" ;                 :Conventions = "CF-1.0" ;

Note that you can install xarray using pip, but if you are using the Anaconda Python Distribution, you can install it from the Anaconda.org/conda-forge channel by using:

conda install -c conda-forge xarray

回答3:

xarray is a good candidate, but I think iris is better because it helps you with the CF-conventions by raising errors when you make mistakes.

The notebook below re-implements the AO/NOA example:

http://nbviewer.ipython.org/gist/ocefpaf/c66a7d0b967664ee4f5c

(See the last cells on the bottom for the "CF-conventions advantages" of iris.)

文章来源: convert csv to netcdf

标签

csv

convert