Sort ignores an apostrophe - sometimes (except when it is the only column used); WHY?

我的梦境 提交于 2019-12-13 09:01:26

问题


This happens to me both on Linux and on cygwin, so I suspect it is not a bug. Still, I don't understand it. Can anyone explain?

Consider the following file (tab-delimited, and that's a regular apostrophe) (I create it with cat to ensure that it wasn't non-printing characters that were the source of the problem)

$cat > temp
cat     1389
cat'    1747
ca't    3175
cat     46848484
ca't    720

$sort temp
<gives the exact same output as cat temp>

$sort -k1,1 temp
cat     1389
cat     46848484
cat'    1747
ca't    3456
ca't    720

Why do I have to ignore the second column in order to sort correctly?


回答1:


I pulled up the manual for sort and noticed the following:

* WARNING * The locale specified by the environment affects sort order. Set LC_ALL=C to get the traditional sort order that uses native byte values.

As it turns out, locales actually specify how lexicographic ordering works for a given locale. This makes a lot of sense, but for some reason it trips over multi field files...

(see also:)
Unusual behaviour of linux's sort command
Why does the sort command sort differently if there are trailing fields?

There are a couple of things you can do:

You can sort naively by byte value using

LC_ALL="C" sort temp

This will give a more logical result, but it might not be the one you actually want.

You could try to get sort to do a more basic lexicographical ordering by setting the locale to C and telling it you want dictionary ordering:

LC_ALL="C" sort -d temp

To have sort output your locale information and hilight the sort key, you can use

sort --debug temp




Personally I'm really curious to know what rule is being specified that makes sort behave unintuitively across multiple fields.

They're supposed to specify correct lexicographic order in the given language and dialect. Do the locales' functions simply not handle the multiple field case at all, or are they taking some kind of different interpretation on the "meaning" of the line?



来源:https://stackoverflow.com/questions/15824747/sort-ignores-an-apostrophe-sometimes-except-when-it-is-the-only-column-used

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!