问题
I have a folder with a bunch of dbf files I would like to convert to csv. I have tried using a code to just change the extension from .dbf to .csv, and these files open fine when I use Excel, but when I open them in pandas they look like this:
s\t�
0 NaN
1 1 176 1.58400000000e+005-3.385...
This is not what I want, and those characters don't appear in the real file.
How should I read in the dbf file correctly?
回答1:
Looking online, there's a few options:
- https://gist.github.com/ryanhill29/f90b1c68f60d12baea81
- http://pandaproject.net/docs/importing-dbf-files.html
- https://geodacenter.asu.edu/blog/2012/01/17/dbf-files-and-p
- https://pypi.python.org/pypi/simpledbf
With simpledbf:
dbf = Dbf5('fake_file_name.dbf')
df = dbf.to_dataframe()
Tweaked from the gist:
import pysal as ps
def dbf2DF(dbfile, upper=True):
"Read dbf file and return pandas DataFrame"
with ps.open(dbfile) as db: # I suspect just using open will work too
df = pd.DataFrame({col: db.by_col(col) for col in db.header})
if upper == True:
df.columns = map(str.upper, db.header)
return df
回答2:
Using my dbf library you could do something like:
import sys
import dbf
for arg in sys.argv[1:]:
dbf.export(arg)
which will create a .csv
file of the same name as each dbf file. If you put that code into a script named dbf2csv.py
you could then call it as
python dbf2csv.py dbfname dbf2name dbf3name ...
回答3:
EDIT#2:
It's possible to read a dbf file, line by line and without conversion into csv, with dbfread
(simply install with pip install dbfread
):
>>> from dbfread import DBF
>>> for row in DBF('southamerica_adm0.dbf'):
... print row
...
OrderedDict([(u'COUNTRY', u'ARGENTINA')])
OrderedDict([(u'COUNTRY', u'BOLIVIA')])
OrderedDict([(u'COUNTRY', u'BRASIL')])
OrderedDict([(u'COUNTRY', u'CHILE')])
OrderedDict([(u'COUNTRY', u'COLOMBIA')])
OrderedDict([(u'COUNTRY', u'ECUADOR')])
OrderedDict([(u'COUNTRY', u'GUYANA')])
OrderedDict([(u'COUNTRY', u'GUYANE')])
OrderedDict([(u'COUNTRY', u'PARAGUAY')])
OrderedDict([(u'COUNTRY', u'PERU')])
OrderedDict([(u'COUNTRY', u'SURINAME')])
OrderedDict([(u'COUNTRY', u'U.K.')])
OrderedDict([(u'COUNTRY', u'URUGUAY')])
OrderedDict([(u'COUNTRY', u'VENEZUELA')])
My updated references:
official project site: http://pandas.pydata.org
official documentation: http://pandas-docs.github.io/pandas-docs-travis/
dbfread
: https://pypi.python.org/pypi/dbfread/2.0.6
geopandas
: http://geopandas.org/
shp and dbf with geopandas
: https://gis.stackexchange.com/questions/129414/only-read-specific-attribute-columns-of-a-shapefile-with-geopandas-fiona
回答4:
Here is my solution that I've been using for years. I have a solution for Python 2.7 and one for Python 3.5 (probably also 3.6).
Python 2.7:
import csv
from dbfpy import dbf
def dbf_to_csv(out_table):#Input a dbf, output a csv
csv_fn = out_table[:-4]+ ".csv" #Set the table as .csv format
with open(csv_fn,'wb') as csvfile: #Create a csv file and write contents from dbf
in_db = dbf.Dbf(out_table)
out_csv = csv.writer(csvfile)
names = []
for field in in_db.header.fields: #Write headers
names.append(field.name)
out_csv.writerow(names)
for rec in in_db: #Write records
out_csv.writerow(rec.fieldData)
in_db.close()
return csv_fn
Python 3.5:
import csv
from dbfread import DBF
def dbf_to_csv(dbf_table_pth):#Input a dbf, output a csv, same name, same path, except extension
csv_fn = dbf_table_pth[:-4]+ ".csv" #Set the csv file name
table = DBF(dbf_table_pth)# table variable is a DBF object
with open(csv_fn, 'w', newline = '') as f:# create a csv file, fill it with dbf content
writer = csv.writer(f)
writer.writerow(table.field_names)# write the column name
for record in table:# write the rows
writer.writerow(list(record.values()))
return csv_fn# return the csv name
You can get dbfpy and dbfread from pip install.
来源:https://stackoverflow.com/questions/32772447/way-to-convert-dbf-to-csv-in-python