How to search for non-ASCII characters with bash tools?

别等时光非礼了梦想. 提交于 2020-05-09 19:08:33

问题


I have a large text file that contains a few unicode characters that make LaTeX crash. How can I find non-ASCII characters in a file with sed, and the like in a Linux bash?


回答1:


Try:

nonascii() { LANG=C grep --color=always '[^ -~]\+'; }

Which can be used like:

printf 'ŨTF8\n' | nonascii

Within [] ^ means "not". So [^ -~] means characters not between space and ~. So excluding control chars, this matches non ASCII characters, and is a more portable though slightly less accurate version of [^\x00-\x7f] below. The \+ means 1 or more and will get multibye characters to have a color shown around the complete character(s), rather than interspersed in each byte, thus corrupting the multibyte sequence




回答2:


Try this command:

grep -P '[^\x00-\x7f]' file


来源:https://stackoverflow.com/questions/13596531/how-to-search-for-non-ascii-characters-with-bash-tools

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!