Sort array of words - non-english letters + double character letters PHP

后端 未结 2 910
灰色年华
灰色年华 2021-01-21 15:04

I want to sort an array of words alphabetically. Unfortunately, in my language (Croatian), there are double-character letters (e.g. lj, nj, dž), and letters that are not properl

相关标签:
2条回答
  • 2021-01-21 15:16

    Here is a class that can help you sort array of strings based on a specific alphabet characters table:

    <?php
    
    /**
     * This class can be used to compare unicode strings.
     * It can be used for easy array sorting.
     * 
     * You can set your own alphabet characters table to be used.
     */
    class UnicodeStringComperator {
        private $alphabet = [];
    
        public function __construct() {
            // We set the default alphabet characters table to a-z.
            $this->alphabet = range('a', 'z');
        }
    
        /**
         * Set the characters table to use for sorting
         * 
         * @param array $alphabet The characters table for the sorting
         */
        public function setAlphabet($alphabet) {
            $this->alphabet = $alphabet;
        }
    
        /**
         * Split the string into an array of the characters
         * 
         * @param string $str The string to split
         * @return array The array of the characters characters in the string
         */
        public function splitter($str){
            return preg_split('//u', $str, -1, PREG_SPLIT_NO_EMPTY);
        }
    
        /**
         * Find the place of the char in the alphabet table
         * 
         * @param string $chr The character to find
         * @return mixed the place of the char in the table or NULL if not found
         */
        public function place($chr) {
            return array_search($chr, $this->alphabet);
        }
    
        /**
         * Do the comparison between the 2 strings
         * 
         * @param string $str1 The first
         * @param string $str2 The first
         * @return int The values -1, 0, 1 if $str1 < $str2, $str1 == $str2 or $str1 > $str2 accordingly
         */
        public function compare($str1, $str2) {
            $chars1 = $this->splitter($str1);
            $chars2 = $this->splitter($str2);
            for ($i = 0; $i < count($chars1) && $i < count($chars2); $i++) {
                $p1 = $this->place($chars1[$i]);
                $p2 = $this->place($chars2[$i]);
                if ($p1 < $p2) {
                    return -1;
                } elseif ($p1 > $p2) {
                    return 1;
                }
            }
            if (count($chars1) <= count($chars2)) {
                return -1;
            }
            return 0;
        }
    
        /**
         * Sort an array of strings based on the alphabet table
         * 
         * @param Array $ar The array of strings to sort
         * @return Array The sorted array.
         */
        public function sort_array($ar) {
            usort($ar, array('self', 'compare'));
            return $ar;
        }
    }
    

    To use with your specific alphabet you can use the setAlphabet function to configure your own characters-sort-table:

    <?php
    $alphabet = array(
                'a', 'b', 'c',
                'č', 'ć', 'd',
                'dž', 'đ', 'e',
                'f', 'g', 'h',
                'i', 'j', 'k',
                'l', 'lj', 'm',
                'n', 'nj', 'o',
                'p', 'q', 'r',
                's', 'š', 't',
                'u', 'v', 'w',
                'x', 'y', 'z', 'ž'
        );
    $comperator = new UnicodeStringComperator();
    $comperator->setAlphabet($alphabet);
    $sorted_words = $comperator->sort_array($words);
    var_dump($sorted_words);
    

    The output is your original array:

    array(34) {
      [0] =>
      string(4) "alfa"
      [1] =>
      string(4) "beta"
      [2] =>
      string(3) "car"
      [3] =>
      string(7) "čvarci"
      [4] =>
      string(4) "ćup"
      [5] =>
      string(4) "drvo"
      [6] =>
      string(5) "džem"
      [7] =>
      string(4) "đak"
      [8] =>
      string(5) "endem"
      [9] =>
      string(5) "fićo"
      [10] =>
      string(4) "grah"
      [11] =>
      string(5) "hrana"
      [12] =>
      string(7) "idealan"
      [13] =>
      string(6) "jabuka"
      [14] =>
      string(4) "koza"
      [15] =>
      string(5) "lijep"
      [16] =>
      string(7) "ljestve"
      [17] =>
      string(5) "mango"
      [18] =>
      string(4) "nebo"
      [19] =>
      string(6) "njezin"
      [20] =>
      string(5) "obrva"
      [21] =>
      string(7) "pivnica"
      [22] =>
      string(6) "qwerty"
      [23] =>
      string(4) "riba"
      [24] =>
      string(3) "sir"
      [25] =>
      string(6) "šaran"
      [26] =>
      string(5) "tikva"
      [27] =>
      string(10) "umanjenica"
      [28] =>
      string(7) "večera"
      [29] =>
      string(4) "wind"
      [30] =>
      string(5) "x-ray"
      [31] =>
      string(6) "yellow"
      [32] =>
      string(5) "zakaj"
      [33] =>
      string(5) "žena"
    }
    
    0 讨论(0)
  • 2021-01-21 15:19

    You can try Collator.

    $words = array( 'alfa', 'beta', 'car', 'čvarci', 'ćup', 'drvo', 'džem', 'đak', 'endem', 'fićo', 'grah', 'hrana', 'idealan', 'jabuka', 'koza', 'lijep', 'ljestve', 'mango', 'nebo', 'njezin', 'obrva', 'pivnica', 'qwerty', 'riba', 'sir', 'šaran', 'tikva', 'umanjenica', 'večera', 'wind', 'x-ray', 'yellow', 'zakaj', 'žena' );
    $collator = new Collator('hr_HR');
    // or $collator = new Collator( 'hr' );
    $collator->sort($words);
    print_r($words);
    

    I am not sure what the locale code is for croatian, you should take a look there. The code is based on a reply to a similar question there.

    0 讨论(0)
提交回复
热议问题