How do I detect non-ASCII characters in a string?

前端 未结 9 2251
面向向阳花
面向向阳花 2020-12-01 07:56

If I have a PHP string, how can I determine if it contains at least one non-ASCII character or not, in an efficient way? And by non-ASCII character, I mean any character tha

相关标签:
9条回答
  • 2020-12-01 08:02

    I benchmarked the suggested functions as I need this check for batch processing of shorter (1000 characters max) strings. I tested 10k iterations of 30 different strings (empty, short, longer, ascii, accents, japanese, emoji, non-ascii start, non-ascii end etc). Here are the rough results:

    mb_check_encoding: 95ms average. Performance degrades way faster than preg_match and ctype as the strings get longer (1MB+).

    mb_check_encoding($input, 'ASCII');
    

    preg_match: 85ms average. Decently fast for 1MB+ strings (walks the string, so faster if there are non-ascii characters early in the string).

    !preg_match('/[\\x80-\\xff]/', $input);
    

    ctype_print: 83ms average. Decently fast for 1MB+ strings (walks the string, so faster if there are non-ascii characters early in the string). DO NOTE that this is not really an ascii check.

    ctype_print($input);
    

    while/ord: 500ms average. I'm still waiting for the 1MB+ strings test to finish.

    function is_ascii($input) {
        $num = 0;
        while( isset( $string[$num] ) ) {
            if( ord( $string[$num] ) & 0x80 ) {
                return false;
            }
            $num++;
        }
        return true;
    }
    
    0 讨论(0)
  • 2020-12-01 08:03

    You can use mb_detect_encoding and check for ASCII:

    mb_detect_encoding($str, 'ASCII', true)
    

    This will return false if $str contains at least one non-ASCI character (byte value > 0x7F).

    0 讨论(0)
  • 2020-12-01 08:07

    Try (mb_detect_encoding)

    0 讨论(0)
  • 2020-12-01 08:11

    The function ctype_print returns true iff all characters fall into the ASCII range 32-126 (PHP unit test).

    0 讨论(0)
  • 2020-12-01 08:13

    I suggest you look into utf8_encode or utf8_decode under PHP's manual:

    http://www.php.net/manual/en/function.utf8-encode.php

    Look into the examples down below as it may have something there that leads you to the right direction if not finding what you are looking for.

    0 讨论(0)
  • 2020-12-01 08:13

    If you do not want to deal with Regex in javascript you can do

    detectUf8 : function(s) {
      var utf8=s.split('').filter(function(C) {
        return C.charCodeAt(0)>127;
      })
      return (utf8.join('').length>0);
    },
    
    0 讨论(0)
提交回复
热议问题