How to handle user input of invalid UTF-8 characters?

后端 未结 9 1993
小鲜肉
小鲜肉 2020-11-29 17:26

I\'m looking for general a strategy/advice on how to handle invalid UTF-8 input from users.

Even though my webapp uses UTF-8, somehow some users enter invalid chara

相关标签:
9条回答
  • 2020-11-29 18:26

    Set UTF-8 as the character set for all headers output by your PHP code

    In every PHP output header, specify UTF-8 as the encoding:

    header('Content-Type: text/html; charset=utf-8');
    
    0 讨论(0)
  • 2020-11-29 18:29

    Try doing what Rails does to force all browsers always to post UTF-8 data:

    <form accept-charset="UTF-8" action="#{action}" method="post"><div
        style="margin:0;padding:0;display:inline">
        <input name="utf8" type="hidden" value="&#x2713;" />
      </div>
      <!-- form fields -->
    </form>
    

    See railssnowman.info or the initial patch for an explanation.

    1. To have the browser sends form-submission data in the UTF-8 encoding, just render the page with a Content-Type header of "text/html; charset=utf-8" (or use a meta http-equiv tag).
    2. To have the browser sends form-submission data in the UTF-8 encoding, even if the user fiddles with the page encoding (browsers let users do that), use accept-charset="UTF-8" in the form.
    3. To have the browser sends form-submission data in the UTF-8 encoding, even if the user fiddles with the page encoding (browsers let users do that), and even if the browser is IE and the user switched the page encoding to Korean and entered Korean characters in the form fields, add a hidden input to the form with a value such as &#x2713; which can only be from the Unicode charset (and, in this example, not the Korean charset).
    0 讨论(0)
  • 2020-11-29 18:33

    Receiving invalid characters from your web app might have to do with the character sets assumed for HTML forms. You can specify which character set to use for forms with the accept-charset attribute:

    <form action="..." accept-charset="UTF-8">
    

    You also might want to take a look at similar questions in StackOverflow for pointers on how to handle invalid characters, e.g. those in the column to the right, but I think that signaling an error to the user is better than trying to clean up those invalid characters which cause unexpected loss of significant data or unexpected change of your user's inputs.

    0 讨论(0)
提交回复
热议问题