DBF - encoding cp1250

扶醉桌前 提交于 2020-01-03 10:23:12

问题


I have dbf database encoded in cp1250 and I am reading this database using folowing code:

import csv
from dbfpy import dbf
import os
import sys

filename = sys.argv[1]
if filename.endswith('.dbf'):
    print "Converting %s to csv" % filename
    csv_fn = filename[:-4]+ ".csv"
    with open(csv_fn,'wb') as csvfile:
        in_db = dbf.Dbf(filename)
        out_csv = csv.writer(csvfile)
        names = []
        for field in in_db.header.fields:
            names.append(field.name)
        #out_csv.writerow(names)
        for rec in in_db:
            out_csv.writerow(rec.fieldData)
        in_db.close()
        print "Done..."
else:
  print "Filename does not end with .dbf"

Problem is, that final csv file is wrong. Encoding of the file is ANSI and some characters are corrupted. I would like to ask you, if you can help me how to read dbf file correctly.

EDIT 1

I tried different code from https://pypi.python.org/pypi/simpledbf/0.2.4, there is some error.

Source 2:

from simpledbf import Dbf5
import os
import sys

dbf = Dbf5('test.dbf', codec='cp1250');
dbf.to_csv('junk.csv');

Output:

python program2.py
Traceback (most recent call last):
  File "program2.py", line 5, in <module>
    dbf = Dbf5('test.dbf', codec='cp1250');
  File "D:\ProgramFiles\Anaconda\lib\site-packages\simpledbf\simpledbf.py",      line 557, in __init__
    assert terminator == b'\r'

AssertionError

I really don't know how to solve this problem.


回答1:


Try using my dbf library:

import dbf
with dbf.Table('test.dbf') as table:
    dbf.export(table, 'junk.csv')



回答2:


I wrote simpledbf. The line that is causing you problems was from some testing I was doing when developing the module. First of all, you might want to update your installation, as 0.2.6 is the most recent. Then you can try removing that particular line (#557) from the file "D:\ProgramFiles\Anaconda\lib\site-packages\simpledbf\simpledbf.py". If that doesn't work, you can ping me at the GitHub repo for simpledbf, or you could try Ethan's suggestion for the dbf module.




回答3:


You can decode and encode as necessary. dbfpy assumes strings are utf8 encoded, so you can decode as it isn't that encoding and then encode again with the right encoding.

import csv
from dbfpy import dbf
import os
import sys

filename = sys.argv[1]
if filename.endswith('.dbf'):
    print "Converting %s to csv" % filename
    csv_fn = filename[:-4]+ ".csv"
    with open(csv_fn,'wb') as csvfile:
        in_db = dbf.Dbf(filename)
        out_csv = csv.writer(csvfile)
        names = []
        for field in in_db.header.fields:
            names.append(field.name)
        #out_csv.writerow(names)
        for rec in in_db:
            row = [i.decode('utf8').encode('cp1250') if isinstance(i, str) else i for i in rec.fieldData]
            out_csv.writerow(rec.fieldData)
        in_db.close()
        print "Done..."
else:
  print "Filename does not end with .dbf"


来源:https://stackoverflow.com/questions/31270429/dbf-encoding-cp1250

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!