How can I detect the encoding/codepage of a text file

后端 未结 20 1405
梦如初夏
梦如初夏 2020-11-21 22:42

In our application, we receive text files (.txt, .csv, etc.) from diverse sources. When reading, these files sometimes contain garbage, because the

20条回答
  •  栀梦
    栀梦 (楼主)
    2020-11-21 23:16

    I know it's very late for this question and this solution won't appeal to some (because of its english-centric bias and its lack of statistical/empirical testing), but it's worked very well for me, especially for processing uploaded CSV data:

    http://www.architectshack.com/TextFileEncodingDetector.ashx

    Advantages:

    • BOM detection built-in
    • Default/fallback encoding customizable
    • pretty reliable (in my experience) for western-european-based files containing some exotic data (eg french names) with a mixture of UTF-8 and Latin-1-style files - basically the bulk of US and western european environments.

    Note: I'm the one who wrote this class, so obviously take it with a grain of salt! :)

提交回复
热议问题