How do you know what encoding the user is inputing into the browser?

前端 未结 3 1721
南旧
南旧 2021-01-26 11:52

I read Joel\'s article about character sets and so I\'m taking his advice to use UTF-8 on my web page and in my database. What I can\'t understand is what to do with user input

相关标签:
3条回答
  • 2021-01-26 12:15

    Don't try to detect, convert all user-inputed text to UTF-8 in your application. You can do all you can on your side, by configuring your webserver to send UTF-8 pages and UTF-8 headers, configure your application to handle all text in UTF-8, tweak your filesystem (if necessary) to handle text files as UTF-8, configure your database, but you simply have no real control on the user end. You can suggest the proper character encoding in your html forms, like the following, but it's not really enforceable on the user end:

    <form action="/index.php" method="post" accept-charset="UTF-8"></form>
    

    Unless detecting the encoding of the user input is the whole purpose of your application, it's a fools errand to try. Assume the encoding is wrong and convert it to UTF-8 in your app. Just as you should assume your user input is malicious and clean it up before you attempt to insert it into your database.

    In most languages that have UTF-8 properly implemented, ASCII characters will survive conversion, so don't worry about that either.

    0 讨论(0)
  • 2021-01-26 12:23

    If your web-page using UTF-8, browser will convert to UTF-8 for you. So, even the special characters are in ASCII it will submit as UTF-8.

    However, you never know itchy hand from an user that switch back the page encoding to ISO-8859-*.

    You can make use on mb_detect_encoding, but is not 100% bullet-proof.

    /* Detect character encoding with current detect_order */
    echo mb_detect_encoding($str);
    
    /* "auto" is expanded to "ASCII,JIS,UTF-8,EUC-JP,SJIS" */
    echo mb_detect_encoding($str, "auto");
    
    /* Specify encoding_list character encoding by comma separated list */
    echo mb_detect_encoding($str, "JIS, eucjp-win, sjis-win");
    
    /* Use array to specify encoding_list  */
    $ary[] = "ASCII";
    $ary[] = "JIS";
    $ary[] = "EUC-JP";
    echo mb_detect_encoding($str, $ary);
    
    0 讨论(0)
  • 2021-01-26 12:26

    Check the HTTP headers to discover the character encoding.

    0 讨论(0)
提交回复
热议问题