How to sort Japanese like Excel

我是研究僧i 提交于 2020-01-05 08:48:31

问题


I want to sort Japanese words ( Kanji) like sort feature in excel. I have tried many ways to sort Japanese text in PHP but the result is not 100% like result in excel.

First . I tried to convert Kanji to Katakana by using this lib (https://osdn.net/projects/igo-php/) but some case is not same like excel. I want to sort these words ASC

けやきの家

高森台病院

みのりの里

My Result :

けやきの家

高森台病院

みのりの里

Excel Result:

けやきの家

みのりの里

高森台病院

Second I tried other way by using this function

 mb_convert_kana($text, "KVc", "utf-8");

The sorting result is correct with those text above, but it contain some case not correct

米田病院

米田病院

高森台病院

My result :

米田病院

米田病院

高森台病院

Excel Result:

高森台病院

米田病院

米田病院

Do you guys have any idea about this. (Sorry for my English ) . Thank you


回答1:


Firstly, Japanese kanji are not sortable. You can sort by its code number, but that order has no meanings.

Your using Igo (or any other morphological analysis libraries) sounds good solution, though it can not be perfect. And your first sort result seems fine for me. Why do you want them to be sorted in Excel order?

In Excel, if a cell keeps remembering its phonetic notations when the user initially typed on Japanese IME (Input Method Editor), that phonetics will be used in sort. That means, as not all cell might be typed manually on IME, some cells may not have information how those kanji-s are read. So results of sorting Kanji-s on Excel could be pretty unpredictable. (If sort seriously needed, usually we add another yomigana field, either in hiragana or katakana, and sort by that column.)

The second method mb_convert_kana() is totally off point. That function is to normalize hiragana/katakana, as there are two sets of letters by historical reason (full-width kana and half-width kana). Applying that function to your Japanese texts only changes kana parts. If that made your expectation satisfied, that must be coincidence.

You must define what Excel Japanese sort order your customer requires first. I will be happy to help you if it is clear.

[Update]

As op commented, mb_convert_kana() was to sort mixed hiragana/katakana. For that purpose, I suggest to use php_intl Collator. For example,

<?php

// demo: Japanese(kana) sort by php_intl Collator

if (version_compare(PHP_VERSION, '5.3.0', '<')) {
    exit ('php_intl extension is available on PHP 5.3.0 or later.');
}    
if (!class_exists('Collator')) {
    exit ('You need to install php_intl extension.');
}

$collator = new Collator('ja_JP');
$textArray = [
  'カキクケコ',
  '日本語',
  'アアト',
  'Alphabet',
  'アイランド',
  'はひふへほ',
  'あいうえお',
  '漢字',
  'たほいや',
  'さしみじょうゆ',
  'Roma',
  'ラリルレロ',
  'アート',
];

$result = $collator->sort($textArray);
if ($result === false) {
    echo "sort failed" . PHP_EOL;
    exit();
}

var_dump($textArray);

This sorts hiragana/katakana mixed texts array. Results are here.

array(13) {
  [0]=>
  string(8) "Alphabet"
  [1]=>
  string(4) "Roma"
  [2]=>
  string(9) "アート"
  [3]=>
  string(9) "アアト"
  [4]=>
  string(15) "あいうえお"
  [5]=>
  string(15) "アイランド"
  [6]=>
  string(15) "カキクケコ"
  [7]=>
  string(21) "さしみじょうゆ"
  [8]=>
  string(12) "たほいや"
  [9]=>
  string(15) "はひふへほ"
  [10]=>
  string(15) "ラリルレロ"
  [11]=>
  string(6) "漢字"
  [12]=>
  string(9) "日本語"
}

You won't need to normalize them by yourself. Both PHP(though with php_intl extension) and database(such like MySQL) know how to sort alphabets in many languages so you do not need to write it.

And, this does not solve the original issue, Kanji sort.



来源:https://stackoverflow.com/questions/47430480/how-to-sort-japanese-like-excel

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!