If I have a PHP string, how can I determine if it contains at least one non-ASCII character or not, in an efficient way? And by non-ASCII character, I mean any character tha
I benchmarked the suggested functions as I need this check for batch processing of shorter (1000 characters max) strings. I tested 10k iterations of 30 different strings (empty, short, longer, ascii, accents, japanese, emoji, non-ascii start, non-ascii end etc). Here are the rough results:
mb_check_encoding: 95ms average. Performance degrades way faster than preg_match and ctype as the strings get longer (1MB+).
mb_check_encoding($input, 'ASCII');
preg_match: 85ms average. Decently fast for 1MB+ strings (walks the string, so faster if there are non-ascii characters early in the string).
!preg_match('/[\\x80-\\xff]/', $input);
ctype_print: 83ms average. Decently fast for 1MB+ strings (walks the string, so faster if there are non-ascii characters early in the string). DO NOTE that this is not really an ascii check.
ctype_print($input);
while/ord: 500ms average. I'm still waiting for the 1MB+ strings test to finish.
function is_ascii($input) {
$num = 0;
while( isset( $string[$num] ) ) {
if( ord( $string[$num] ) & 0x80 ) {
return false;
}
$num++;
}
return true;
}
You can use mb_detect_encoding and check for ASCII:
mb_detect_encoding($str, 'ASCII', true)
This will return false if $str
contains at least one non-ASCI character (byte value > 0x7F).
Try (mb_detect_encoding)
The function ctype_print returns true iff all characters fall into the ASCII range 32-126 (PHP unit test).
I suggest you look into utf8_encode or utf8_decode under PHP's manual:
http://www.php.net/manual/en/function.utf8-encode.php
Look into the examples down below as it may have something there that leads you to the right direction if not finding what you are looking for.
If you do not want to deal with Regex
in javascript you can do
detectUf8 : function(s) {
var utf8=s.split('').filter(function(C) {
return C.charCodeAt(0)>127;
})
return (utf8.join('').length>0);
},