sort: string comparison failed Invalid or incomplete multibyte or wide character

荒凉一梦 提交于 2019-12-23 09:03:06

问题


I'm trying to use the following command on a text file:

$ sort <m.txt | uniq -c | sort -nr >m.dict 

However I get the following error message:

sort: string comparison failed: Invalid or incomplete multibyte or wide character
sort: Set LC_ALL='C' to work around the problem.
sort: The strings compared were ‘enwedig\r’ and ‘mwy\r’.

I'm using Cygwin on Windows 7 and was having trouble earlier editing m.txt to put each word within the file on a new line. Please see:

Using AWK to place each word in a text file on a new line

I'm not sure if I'm getting these errors due to this, or because m.txt contains characters from the Welsh alphabet (When I was working with Welsh text in Python, I was required t change the encoding to 'Latin-1').

I tried following the error message's advice and changing LC_ALL='C' however this has not helped. Can anyone elaborate on the errors I'm receiving and provide any advice on how I might go about trying to solve this problem.

UPDATE:

When trying dos2unix, errors were being displayed about invalid characters at certain lines. It turns out these were not Welsh characters, but other strange characters (arrows etc). I went through my text file removing these characters until I was able to use the dos2unix command without error. However, after using the dos2unix command all the text was concatenated (no spaces/newlines or anything, whereas it should have been so that each word in the file was on a seperate line) I then used unix2dos and the text file was back to normal. How can I each word on its own individual line and use the sort command without it giving me errors about '\r' characters?


回答1:


I know it's an old question, but just running the command export LC_ALL='C' does the trick as described by sort: Set LC_ALL='C' to work around the problem..




回答2:


Looks like a Windows line-ending related problem (\r\n versus \n). You can convert m.txt to Unix line-endings with

dos2unix m.txt

and then rerun your command.



来源:https://stackoverflow.com/questions/36292307/sort-string-comparison-failed-invalid-or-incomplete-multibyte-or-wide-character

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!