问题
I want to have different process for English word and Japanese word in this function
function process_word($word) {
if($word is english) {
/////////
}else if($word is japanese) {
////////
}
}
thank you
回答1:
A quick solution that doesn't need the mb_string
extension:
if (strlen($str) != strlen(utf8_decode($str))) {
// $str uses multi-byte chars (isn't English)
}
else {
// $str is ASCII (probably English)
}
Or a modification of the solution provided by @Alexander Konstantinov:
function isKanji($str) {
return preg_match('/[\x{4E00}-\x{9FBF}]/u', $str) > 0;
}
function isHiragana($str) {
return preg_match('/[\x{3040}-\x{309F}]/u', $str) > 0;
}
function isKatakana($str) {
return preg_match('/[\x{30A0}-\x{30FF}]/u', $str) > 0;
}
function isJapanese($str) {
return isKanji($str) || isHiragana($str) || isKatakana($str);
}
回答2:
This function checks whether a word contains at least one Japanese letter (I found unicode range for Japanese letters in Wikipedia).
function isJapanese($word) {
return preg_match('/[\x{4E00}-\x{9FBF}\x{3040}-\x{309F}\x{30A0}-\x{30FF}]/u', $word);
}
回答3:
You could try Google's Translation API that has a detection function: http://code.google.com/apis/language/translate/v2/using_rest.html#detect-language
回答4:
Try with mb_detect_encoding function, if encoding is EUC-JP or UTF-8 / UTF-16 it can be japanese, otherwise english. The better is if you can ensure which encoding each language, as UTF encodings can be used for many languages
回答5:
English text usually consists only of ASCII characters (or better say, characters in ASCII range).
回答6:
You can try to convert the charset and check if it succeeds.
Take a look at iconv: http://www.php.net/manual/en/function.iconv.php
If you can convert a string to ISO-8859-1 it might be english, if you can convert to iso-2022-jp it is propably japanese (I might be wrong for the exact charsets, you should google for them).
来源:https://stackoverflow.com/questions/2856942/how-to-check-if-the-word-is-japanese-or-english-using-php