sort upper case just before lowercase key values from a hash

后端 未结 4 1512
情书的邮戳
情书的邮戳 2020-12-07 03:36

I have an hash and i want to sort based on the keys with upper case words appearing just before the lowercase words.

Example:

JANE
jane
JIM

相关标签:
4条回答
  • 2020-12-07 03:48

    Use a custom sort which first compares the items based on their lowercased representations (so that all variations of "jane" appear before variations of "jim"), then resolves ties by doing a default ASCII comparison (where uppercase comes before lowercase):

    perl -e 'print join "\n", sort { lc $a cmp lc $b || $a cmp $b } qw( jim JANE jane JIM )'
    

    Output:

    JANE
    jane
    JIM
    jim
    
    0 讨论(0)
  • 2020-12-07 03:49

    Unicode Collation

    Although it may seem like overkill for this operation, the standard Unicode::Collate and Unicode::Collate::Locale modules are made for this sort of thing. They also sort non-ASCII data alphabetically, which the normal sort will not do.

    use utf8;
    @names = qw[ jim JANE jane JIM josé josie Mary María mark ];
    @sorts = sort @names;
    

    That gives you the sort order of

    JANE JIM Mary María jane jim josie josé mark
    

    which nobody wants. This is much better:

    use utf8;
    use Unicode::Collate;
    @names = qw[ jim JANE jane JIM josé josie Mary María mark ];
    $coll = new Unicode::Collate;
    @sorts = $coll->sort(@names);
    

    That gives you

    jane JANE jim JIM josé josie María mark Mary
    

    If you want uppercase before lowercase, specify that this way:

    use utf8;
    use Unicode::Collate;
    @names = qw[ jim JANE jane JIM josé josie Mary María mark ];
    $coll = new Unicode::Collate upper_before_lower => 1;
    @sorts = $coll->sort(@names);
    print "@sorts\n";
    

    which yields:

    JANE jane JIM jim josé josie María mark Mary
    

    Simple Compares

    You can use collation objects’ cmp method on a pair of strings in the customary fashion, like

    #!/usr/bin/env perl
    
    use 5.10.1;
    use strict;
    use autodie; 
    use warnings qw[ FATAL all ];
    use utf8;
    use open qw[ :std IO :utf8 ];
    use Unicode::Collate;
    
    my @names = qw[ fum fee fie foe ];
    my $coll = Unicode::Collate->new;
    my @sorts = $coll->sort(@names);
    say "@names => @sorts\n";
    
    for (
          my($a, $b) = splice @names, 0, 2;
          2 == grep {defined} $a, $b;
          ($a, $b) = ($b, shift @names)
        )
    {
        given ($coll->cmp($a, $b)) {
            when (-1) { say "$a < $b" }
            when ( 0) { say "$a = $b" }
            when (+1) { say "$a > $b" }
            default   { die "NOT REACHED" }
        }
    }
    

    which produces:

    fum fee fie foe => fee fie foe fum
    
    fum > fee
    fee < fie
    fie < foe
    

    Fancier Alphabetic Sorts of Unicode

    Now consider a list of words like this:

    sát sot sät sét sæt ssét sat tot ßet SET set seat ſAT ſet saet SSET
    

    If you run the default sort on that, you get the virtually useless:

    SET SSET saet sat seat set sot ssét sát sät sæt sét tot ßet ſAT ſet
    

    And a case-sensitive sort is really no better:

    use utf8;
    @names = qw[ sát sot sät sét sæt ssét sat tot ßet SET set seat ſAT ſet saet SSET ];
    @sorts = sort {
        lc $a  cmp  lc $b
               ||
           $a  cmp  $b
    } @names;
    print "@sorts\n";
    

    producing the still stupid-and-wrong:

    saet sat seat SET set sot SSET ssét sát sät sæt sét tot ßet ſAT ſet
    

    But here it is with a standard Unicode sort:

    use utf8;
    use Unicode::Collate;
    @names = qw[ sát sot sät sét sæt ssét sat tot ßet SET set seat ſAT ſet saet SSET ];
    $coll = new Unicode::Collate upper_before_lower => 1;
    @sorts = $coll->sort(@names);
    print "@sorts\n";
    

    producing the ‘correcter’ (read: infinitely preferable) version of:

    saet sæt sät sat sát ſAT seat SET set sét ſet sot SSET ssét ßet tot
    

    Locale Sorts

    The Unicode::Collate module is pretty fast, so you should not hestitate to use it on your route character sorting needs. But sometimes that just isn’t enough. That’s because different languages have different ideas of alphabets.

    • Latin (archaic): a b c d e f z h i k l m n o p q r s t v x
    • Latin (classic): a b c d e f g h i k l m n o p q r s t v x y z
    • Spanish (traditional): a b c ch d e f g h i j k l ll m n ñ o p q r rr s t u v x w y z
    • Spanish (recent): a b c d e f g h i j k l m n ñ o p q r s t u v x w y z
    • Catalan: a b c ç d e f g h i j k l m n o p q r s t u v x w y z
    • Welsh: a b c ch d dd e f ff g ng h i l ll m n o p ph r rh s t th u w y
    • Danish: a b c d e f g h i j k l m n o p q r s t u v w x y z æ ø å
    • Icelandic: a á b d ð e é f g h i í j k l m n o ó p r s t u ú v x y ý þ æ ö
    • Old English: a b c d e f ȝ/g h i k l m n o p q r s t v x y z & ⁊ ƿ þ ð æ
    • Middle English: a b c d e f g h i k l m n o p q r ſ/s t v x y z ȝ ƿ þ ð æ
    • Futhorc (transliterated): f u þ o r c ȝ w h n i j eo p x s t b e m l ŋ d œ a æ y ea io cw k st g
    • Greek: α β γ δ ε ζ η θ ι κ λ μ ν ξ ο π ρ σ/ς τ υ φ χ ψ ω
    • Cyrillic: а б в г д е ё ж з и й к л м н о п р с т у ф х ц ч ш щ ъ ы ь э ю я
    • Cherokee: Ꭰ Ꭱ Ꭲ Ꭳ Ꭴ Ꭵ Ꭶ Ꭷ Ꭸ Ꭹ Ꭺ Ꭻ Ꭼ Ꭽ Ꭾ Ꭿ Ꮀ Ꮁ Ꮂ Ꮃ Ꮄ Ꮅ Ꮆ Ꮇ Ꮈ Ꮉ Ꮊ Ꮋ Ꮌ Ꮍ Ꮎ Ꮏ Ꮐ Ꮑ Ꮒ Ꮓ Ꮔ Ꮕ Ꮖ Ꮗ Ꮘ Ꮙ Ꮚ Ꮛ Ꮜ Ꮝ Ꮞ Ꮟ Ꮠ Ꮡ Ꮢ Ꮣ Ꮤ Ꮥ Ꮦ Ꮧ Ꮨ Ꮩ Ꮪ Ꮫ Ꮬ Ꮭ Ꮮ Ꮯ Ꮰ Ꮱ Ꮲ Ꮳ Ꮴ Ꮵ Ꮶ Ꮷ Ꮸ Ꮹ Ꮺ Ꮻ Ꮼ Ꮽ Ꮾ Ꮿ Ᏸ Ᏹ Ᏺ Ᏻ Ᏼ

    BTW, those are also good examples why “ever hardcoding [a-z] into your program is always wrong, sometimes.” It’s full of idiotic and even insulting assumptions. Note that all but the last three of these are actually considered Latin alphabets! That’s the same script as we use in English. In representing English text, I’ve variously had to deal with learnèd, Æneid, poſt, Laȝamon, résumé, 1ˢᵗ, MᶜKinley, Van Dijke, Cañon City Colorado, œnology, Dzur, rôle, ⅷ, première, Bjørn, naïve, coöperate, façade, café, Merððyn, archæology, and even tschüß. Repeat the mantra: “Hardcoding [a-z] into your program is always wrong, sometimes.” Just Say No!

    The Unicode::Collate::Locale module handles local sorting conventions. Just as English phonebooks and bookshelves have special ways of sorting names so that it doesn’t metter whether you’ve spelt something McBride or MacBride, the German-speaking world sorts their names such that Händel and Haendel are the same. That’s why without diacritics, one must obligatorily write über‑ as ueber‑ and Übermensch as Uebermensch. A locale sort knows to do this:

    use utf8;
    use Unicode::Collate::Locale;
    @names = qw[ sát sot sät sét sæt ssét sat tot ßet SET set seat ſAT ſet saet SSET ];
    
    $coll = new Unicode::Collate::Locale::
                locale             => de__phonebook,
                upper_before_lower => 1,
            ;
    
    @sorts = $coll->sort(@names);
    print "@sorts\n";
    

    now produces

    saet sæt sät sat sát ſAT seat SET set sét ſet sot SSET ssét ßet tot
    

    Se habla castellano

    It’s remarkable how different from one’s own other countries’ locale conventions can be. In the Spanish locale ("es"), ñ is a letter that comes after n and before o. That means that the correct sort of

    raña rastrillo radio rana rápido ráfaga ranúnculo
    

    is

    radio ráfaga rana raña ranúnculo rápido rastrillo
    

    Say those all really fast with a fully-rolled rr to loosen your tongue. :)

    The "es__traditional" locale is a little different; historically, chocolate came after color in the Spanish dictionary, unlike the way it works in Enlgish. That’s because ch came after c and before d, while ll came after l and before m. That means that this sequence:

    lástima laña llama ligante
    cidra caliente color chocolate con churros
    pero pera Perú perro periglo peste
    

    sorts to

    caliente cidra color con chocolate churros 
    laña lástima ligante llama 
    pera periglo pero perro Perú peste
    
    0 讨论(0)
  • 2020-12-07 03:54

    To get the keys in order, apply sort with a custom sort function on the keys of the hash.

    my %hash = ( JANE => 1, jane => 2, JIM => 3, jim => 4 );
    my @sorted_keys = sort {
        lc $a cmp lc $b
            || $a cmp $b
    } keys %hash;
    

    This custom sort function compares strings first as if they were of the same case, and if equal, takes case into account.

    0 讨论(0)
  • 2020-12-07 04:13

    Try:

    @list = ("jane","JIM","JANE","jim");
    print sort { uc $a cmp uc $b or $a cmp $b } @list;
    
    0 讨论(0)
提交回复
热议问题