问题
I want to sort Japanese words ( Kanji) like sort feature in excel. I have tried many ways to sort Japanese text in PHP but the result is not 100% like result in excel.
First . I tried to convert Kanji to Katakana by using this lib (https://osdn.net/projects/igo-php/) but some case is not same like excel. I want to sort these words ASC
けやきの家
高森台病院
みのりの里
My Result :
けやきの家
高森台病院
みのりの里
Excel Result:
けやきの家
みのりの里
高森台病院
Second I tried other way by using this function
mb_convert_kana($text, "KVc", "utf-8");
The sorting result is correct with those text above, but it contain some case not correct
米田病院
米田病院
高森台病院
My result :
米田病院
米田病院
高森台病院
Excel Result:
高森台病院
米田病院
米田病院
Do you guys have any idea about this. (Sorry for my English ) . Thank you
回答1:
Firstly, Japanese kanji are not sortable. You can sort by its code number, but that order has no meanings.
Your using Igo (or any other morphological analysis libraries) sounds good solution, though it can not be perfect. And your first sort result seems fine for me. Why do you want them to be sorted in Excel order?
In Excel, if a cell keeps remembering its phonetic notations when the user initially typed on Japanese IME (Input Method Editor), that phonetics will be used in sort. That means, as not all cell might be typed manually on IME, some cells may not have information how those kanji-s are read. So results of sorting Kanji-s on Excel could be pretty unpredictable. (If sort seriously needed, usually we add another yomigana field, either in hiragana or katakana, and sort by that column.)
The second method mb_convert_kana() is totally off point. That function is to normalize hiragana/katakana, as there are two sets of letters by historical reason (full-width kana and half-width kana). Applying that function to your Japanese texts only changes kana parts. If that made your expectation satisfied, that must be coincidence.
You must define what Excel Japanese sort order your customer requires first. I will be happy to help you if it is clear.
[Update]
As op commented, mb_convert_kana() was to sort mixed hiragana/katakana. For that purpose, I suggest to use php_intl Collator. For example,
<?php
// demo: Japanese(kana) sort by php_intl Collator
if (version_compare(PHP_VERSION, '5.3.0', '<')) {
exit ('php_intl extension is available on PHP 5.3.0 or later.');
}
if (!class_exists('Collator')) {
exit ('You need to install php_intl extension.');
}
$collator = new Collator('ja_JP');
$textArray = [
'カキクケコ',
'日本語',
'アアト',
'Alphabet',
'アイランド',
'はひふへほ',
'あいうえお',
'漢字',
'たほいや',
'さしみじょうゆ',
'Roma',
'ラリルレロ',
'アート',
];
$result = $collator->sort($textArray);
if ($result === false) {
echo "sort failed" . PHP_EOL;
exit();
}
var_dump($textArray);
This sorts hiragana/katakana mixed texts array. Results are here.
array(13) {
[0]=>
string(8) "Alphabet"
[1]=>
string(4) "Roma"
[2]=>
string(9) "アート"
[3]=>
string(9) "アアト"
[4]=>
string(15) "あいうえお"
[5]=>
string(15) "アイランド"
[6]=>
string(15) "カキクケコ"
[7]=>
string(21) "さしみじょうゆ"
[8]=>
string(12) "たほいや"
[9]=>
string(15) "はひふへほ"
[10]=>
string(15) "ラリルレロ"
[11]=>
string(6) "漢字"
[12]=>
string(9) "日本語"
}
You won't need to normalize them by yourself. Both PHP(though with php_intl extension) and database(such like MySQL) know how to sort alphabets in many languages so you do not need to write it.
And, this does not solve the original issue, Kanji sort.
来源:https://stackoverflow.com/questions/47430480/how-to-sort-japanese-like-excel