sort upper case just before lowercase key values from a hash

后端未结

关注

 4  1512

情书的邮戳

I have an hash and i want to sort based on the keys with upper case words appearing just before the lowercase words.

Example:

JANE
jane
JIM

相关标签:

4条回答

眼角桃花

2020-12-07 03:48
Use a custom sort which first compares the items based on their lowercased representations (so that all variations of "jane" appear before variations of "jim"), then resolves ties by doing a default ASCII comparison (where uppercase comes before lowercase):
```
perl -e 'print join "\n", sort { lc $a cmp lc $b || $a cmp $b } qw( jim JANE jane JIM )'
```
Output:
```
JANE
jane
JIM
jim
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
傲寒

2020-12-07 03:49
Unicode Collation

Although it may seem like overkill for this operation, the standard Unicode::Collate and Unicode::Collate::Locale modules are made for this sort of thing. They also sort non-ASCII data alphabetically, which the normal sort will not do.
```
use utf8;
@names = qw[ jim JANE jane JIM josé josie Mary María mark ];
@sorts = sort @names;
```
That gives you the sort order of
```
JANE JIM Mary María jane jim josie josé mark
```
which nobody wants. This is much better:
```
use utf8;
use Unicode::Collate;
@names = qw[ jim JANE jane JIM josé josie Mary María mark ];
$coll = new Unicode::Collate;
@sorts = $coll->sort(@names);
```
That gives you
```
jane JANE jim JIM josé josie María mark Mary
```
If you want uppercase before lowercase, specify that this way:
```
use utf8;
use Unicode::Collate;
@names = qw[ jim JANE jane JIM josé josie Mary María mark ];
$coll = new Unicode::Collate upper_before_lower => 1;
@sorts = $coll->sort(@names);
print "@sorts\n";
```
which yields:
```
JANE jane JIM jim josé josie María mark Mary
```
Simple Compares

You can use collation objects’ cmp method on a pair of strings in the customary fashion, like
```
#!/usr/bin/env perl

use 5.10.1;
use strict;
use autodie; 
use warnings qw[ FATAL all ];
use utf8;
use open qw[ :std IO :utf8 ];
use Unicode::Collate;

my @names = qw[ fum fee fie foe ];
my $coll = Unicode::Collate->new;
my @sorts = $coll->sort(@names);
say "@names => @sorts\n";

for (
      my($a, $b) = splice @names, 0, 2;
      2 == grep {defined} $a, $b;
      ($a, $b) = ($b, shift @names)
    )
{
    given ($coll->cmp($a, $b)) {
        when (-1) { say "$a < $b" }
        when ( 0) { say "$a = $b" }
        when (+1) { say "$a > $b" }
        default   { die "NOT REACHED" }
    }
}
```
which produces:
```
fum fee fie foe => fee fie foe fum

fum > fee
fee < fie
fie < foe
```
Fancier Alphabetic Sorts of Unicode

Now consider a list of words like this:
```
sát sot sät sét sæt ssét sat tot ßet SET set seat ſAT ſet saet SSET
```
If you run the default sort on that, you get the virtually useless:
```
SET SSET saet sat seat set sot ssét sát sät sæt sét tot ßet ſAT ſet
```
And a case-sensitive sort is really no better:
```
use utf8;
@names = qw[ sát sot sät sét sæt ssét sat tot ßet SET set seat ſAT ſet saet SSET ];
@sorts = sort {
    lc $a  cmp  lc $b
           ||
       $a  cmp  $b
} @names;
print "@sorts\n";
```
producing the still stupid-and-wrong:
```
saet sat seat SET set sot SSET ssét sát sät sæt sét tot ßet ſAT ſet
```
But here it is with a standard Unicode sort:
```
use utf8;
use Unicode::Collate;
@names = qw[ sát sot sät sét sæt ssét sat tot ßet SET set seat ſAT ſet saet SSET ];
$coll = new Unicode::Collate upper_before_lower => 1;
@sorts = $coll->sort(@names);
print "@sorts\n";
```
producing the ‘correcter’ (read: infinitely preferable) version of:
```
saet sæt sät sat sát ſAT seat SET set sét ſet sot SSET ssét ßet tot
```
Locale Sorts

The Unicode::Collate module is pretty fast, so you should not hestitate to use it on your route character sorting needs. But sometimes that just isn’t enough. That’s because different languages have different ideas of alphabets.
- Latin (archaic): a b c d e f z h i k l m n o p q r s t v x
- Latin (classic): a b c d e f g h i k l m n o p q r s t v x y z
- Spanish (traditional): a b c ch d e f g h i j k l ll m n ñ o p q r rr s t u v x w y z
- Spanish (recent): a b c d e f g h i j k l m n ñ o p q r s t u v x w y z
- Catalan: a b c ç d e f g h i j k l m n o p q r s t u v x w y z
- Welsh: a b c ch d dd e f ff g ng h i l ll m n o p ph r rh s t th u w y
- Danish: a b c d e f g h i j k l m n o p q r s t u v w x y z æ ø å
- Icelandic: a á b d ð e é f g h i í j k l m n o ó p r s t u ú v x y ý þ æ ö
- Old English: a b c d e f ȝ/g h i k l m n o p q r s t v x y z & ⁊ ƿ þ ð æ
- Middle English: a b c d e f g h i k l m n o p q r ſ/s t v x y z ȝ ƿ þ ð æ
- Futhorc (transliterated): f u þ o r c ȝ w h n i j eo p x s t b e m l ŋ d œ a æ y ea io cw k st g
- Greek: α β γ δ ε ζ η θ ι κ λ μ ν ξ ο π ρ σ/ς τ υ φ χ ψ ω
- Cyrillic: а б в г д е ё ж з и й к л м н о п р с т у ф х ц ч ш щ ъ ы ь э ю я
- Cherokee: Ꭰ Ꭱ Ꭲ Ꭳ Ꭴ Ꭵ Ꭶ Ꭷ Ꭸ Ꭹ Ꭺ Ꭻ Ꭼ Ꭽ Ꭾ Ꭿ Ꮀ Ꮁ Ꮂ Ꮃ Ꮄ Ꮅ Ꮆ Ꮇ Ꮈ Ꮉ Ꮊ Ꮋ Ꮌ Ꮍ Ꮎ Ꮏ Ꮐ Ꮑ Ꮒ Ꮓ Ꮔ Ꮕ Ꮖ Ꮗ Ꮘ Ꮙ Ꮚ Ꮛ Ꮜ Ꮝ Ꮞ Ꮟ Ꮠ Ꮡ Ꮢ Ꮣ Ꮤ Ꮥ Ꮦ Ꮧ Ꮨ Ꮩ Ꮪ Ꮫ Ꮬ Ꮭ Ꮮ Ꮯ Ꮰ Ꮱ Ꮲ Ꮳ Ꮴ Ꮵ Ꮶ Ꮷ Ꮸ Ꮹ Ꮺ Ꮻ Ꮼ Ꮽ Ꮾ Ꮿ Ᏸ Ᏹ Ᏺ Ᏻ Ᏼ
BTW, those are also good examples why “ever hardcoding [a-z] into your program is always wrong, sometimes.” It’s full of idiotic and even insulting assumptions. Note that all but the last three of these are actually considered Latin alphabets! That’s the same script as we use in English. In representing English text, I’ve variously had to deal with learnèd, Æneid, poﬅ, Laȝamon, résumé, 1ˢᵗ, MᶜKinley, Van Dĳke, Cañon City Colorado, œnology, ǲur, rôle, ⅷ, première, Bjørn, naïve, coöperate, façade, café, Merððyn, archæology, and even tschüß. Repeat the mantra: “Hardcoding [a-z] into your program is always wrong, sometimes.” Just Say No!

The Unicode::Collate::Locale module handles local sorting conventions. Just as English phonebooks and bookshelves have special ways of sorting names so that it doesn’t metter whether you’ve spelt something McBride or MacBride, the German-speaking world sorts their names such that Händel and Haendel are the same. That’s why without diacritics, one must obligatorily write über‑ as ueber‑ and Übermensch as Uebermensch. A locale sort knows to do this:
```
use utf8;
use Unicode::Collate::Locale;
@names = qw[ sát sot sät sét sæt ssét sat tot ßet SET set seat ſAT ſet saet SSET ];

$coll = new Unicode::Collate::Locale::
            locale             => de__phonebook,
            upper_before_lower => 1,
        ;

@sorts = $coll->sort(@names);
print "@sorts\n";
```
now produces
```
saet sæt sät sat sát ſAT seat SET set sét ſet sot SSET ssét ßet tot
```
Se habla castellano

It’s remarkable how different from one’s own other countries’ locale conventions can be. In the Spanish locale ("es"), ñ is a letter that comes after n and before o. That means that the correct sort of
```
raña rastrillo radio rana rápido ráfaga ranúnculo
```
is
```
radio ráfaga rana raña ranúnculo rápido rastrillo
```
Say those all really fast with a fully-rolled rr to loosen your tongue. :)

The "es__traditional" locale is a little different; historically, chocolate came after color in the Spanish dictionary, unlike the way it works in Enlgish. That’s because ch came after c and before d, while ll came after l and before m. That means that this sequence:
```
lástima laña llama ligante
cidra caliente color chocolate con churros
pero pera Perú perro periglo peste
```
sorts to
```
caliente cidra color con chocolate churros 
laña lástima ligante llama 
pera periglo pero perro Perú peste
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
时光取名叫无心

2020-12-07 03:54
To get the keys in order, apply sort with a custom sort function on the keys of the hash.
```
my %hash = ( JANE => 1, jane => 2, JIM => 3, jim => 4 );
my @sorted_keys = sort {
    lc $a cmp lc $b
        || $a cmp $b
} keys %hash;
```
This custom sort function compares strings first as if they were of the same case, and if equal, takes case into account.
0 讨论(0)
发布评论:

提交评论
- 加载中...

余生分开走

2020-12-07 04:13

Try:

@list = ("jane","JIM","JANE","jim");
print sort { uc $a cmp uc $b or $a cmp $b } @list;

0 讨论(0)

sort upper case just before lowercase key values from a hash

Unicode Collation

Simple Compares

Fancier Alphabetic Sorts of Unicode

Locale Sorts

Se habla castellano