mb_detect_encoding doesn't properly working with Windows-1250 (CP1250)

不问归期 提交于 2019-12-11 03:28:31

问题


I have problem with detecting CP1250 in mb_detect_encoding(), in my case I want detect 3 encodings:

mb_detect_encoding($string, 'UTF-8,ISO-8859-2,Windows-1250')

But Windows isn't in supported encodings, any solution?


回答1:


mb_detect_encoding always "detects" single-byte encodings. You can read about this in the documentation for mb_detect_order:

mbstring currently implements the following encoding detection filters. If there is an invalid byte sequence for the following encodings, encoding detection will fail.

UTF-8, UTF-7, ASCII, EUC-JP,SJIS, eucJP-win, SJIS-win, JIS, ISO-2022-JP

For ISO-8859-X, mbstring always detects as ISO-8859-X.

For UTF-16, UTF-32, UCS2 and UCS4, encoding detection will fail always.

Conclusions:

  1. It's meaningless to ask for detection of ISO-8859-2; it will always tell you "yes, that's it" (unless of course it detects UTF-8 first).
  2. Windows-1250 is not supported, but even if it were it would work exactly like ISO-8859-2.

In general, it is impossible to detect single-byte encodings with accuracy. If you find yourself needing to do that in PHP you will need to do it manually; don't expect very good results.




回答2:


It is not feasible to distinguish ISO-8859-2 from Windows-1250, or any other single-byte encoding from any other encoding for that matter. mb_detect_encoding simply gives you the first encoding which is valid for the given string, and both are equally valid. "Detecting" encodings is by definition not possible with any amount of accuracy.



来源:https://stackoverflow.com/questions/17104340/mb-detect-encoding-doesnt-properly-working-with-windows-1250-cp1250

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!