发表新帖

发表新帖

Best way to decode unknown unicoding encoding in Python 2.5 [duplicate]

前端未结

关注

 3  1984

清酒与你 2021-02-01 10:44

3条回答

伪装坚强ぢ (楼主)

2021-02-01 11:11

I've tackled the same problem and found that there's no way to determine a content's encoding type without metadata about the content. That's why I ended up with the same approach you're trying here.

My only additional advice to what you've done is, rather than ordering the list of possible encoding in most-likely order, you should order it by specificity. I've found that certain character sets are subsets of others, and so if you check utf_8 as your second choice, you'll miss ever finding the subsets of utf_8 (I think one of the Korean character sets uses the same number space as utf).

0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...

热议问题