Encoding problems with dBase III .dbf files on different machines

本秂侑毒 提交于 2019-11-29 12:59:03

If you are still having a problem with these files, I may be able to help you.

What is in the "codepage byte" aka "language driver id" (LDID) at offset 29 (decimal) in the file?

I have a Python-based DBF reader which can read just about any field data type and just about any codepage -- it has a long list compiled from various sources of mappings from codepage byte to codepage number. Options are (1) believe the LDID, deliver Unicode (2) ignore the LDID, deliver undecoded bytes (3) override the LDID, decode with a specific codepage into Unicode. The Unicode can of course be then encoded into UTF-8.

The DBF reader also does a whole lot of reasonableness cross-checks which may help investigating why VFP thinks the file is corrupt.

How do you know that it's using IBM850? Another piece of Python code that I have is a prototype encoding detector, which unlike detectors like 'chardet' which are derived from Mozilla code is not web-centric and can happily recognise most old DOS codepages -- this may help.

A observation: the Greek letter lowercase sigma (σ) is 0xE5 in codepage 437, which was succeded by codepage 850 -- "pc2" seems a little outdated ...

If you think I can be of any help, feel free to e-mail me at insert_punctuation("sjmachin", "lexicon", "net")

Tural

Try this code.

var oConn = new System.Data.Odbc.OdbcConnection();
oConn.ConnectionString = "Driver={Microsoft Visual FoxPro Driver};SourceType=DBF;SourceDB=" + dbPath;
oConn.Open();
var oCmd = oConn.CreateCommand();
oCmd.CommandText = @"SELECT name FROM " + dbPath + "TABLE.DBF";
var reader = oCmd.ExecuteReader();
reader.Read(); 
byte[] A = Encoding.GetEncoding(Encoding.Default.CodePage).GetBytes(reader.GetString(0));
string p = Encoding.Unicode.GetString((Encoding.Convert(Encoding.GetEncoding(850), Encoding.Unicode, A)));

When you read dbf file you should understand that you should take into account 3 types of encoding:

1.Encoding in which database provider reads the file. It depends on provider and current operation system. This encoding shall be used for bytes array receiving. For example on my PC:

  • when I use connection string "Data Source={0}; Provider=Microsoft.JET.OLEDB.4.0;Extended Properties=DBase IV;User ID=;Password=;", strings are read using 866 code page (Russian MS-DOS)

  • when I use connection string "Data Source={0}; Provider=vfpoledb.1;Exclusive=No;Collating Sequence=Machine", strings are read using Encoding.Default (1251 code page)

2.Encoding in which strings are written to dbf file. It can be received from 29 byte of dbf file, but in fact there is no matter what how dbf file encoding is marked, you should just know what encoding was used. This encoding shall be used as source encoding during string conversion

3.Encoding to which string shall be converted. This is UTF-8 usually.

So string conversion should look like this:

byte[] bytes = Encoding.GetEncoding(codePage1).GetBytes(reader.GetString(0));

string result = Encoding.UTF8.GetString((Encoding.Convert(Encoding.GetEncoding(codePage2), Encoding.UTF8, bytes)));

Have you tried using the Visual Foxpro driver "VFPOleDb" driver instead???

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!