Remove lines that contain non-english (Ascii) characters from a file

前端 未结 4 2115
我在风中等你
我在风中等你 2021-02-13 02:48

I have a text file with characters from different languages like (chinese, latin etc)

I want to remove all lines that contain these non-English characters. I want to inc

4条回答
  •  孤独总比滥情好
    2021-02-13 03:41

    You can use Awk, provided you force the use of the C locale:

    LC_CTYPE=C awk '! /[^[:alnum:][:space:][:punct:]]/' my_file
    

    The environment variable LC_TYPE=C (or LC_ALL=C) force the use of the C locale for character classification. It changes the meaning of the character classes ([:alnum:], [:space:], etc.) to match only ASCII characters.

    The /[^[:alnum:][:space:][:punct:]]/ regex match lines with any non ASCII character. The ! before the regex invert the condition. So only lines without any non ASCII characters will match. Then as no action is given, the default action is used for matching lines (print).

    EDIT: This can also be done with grep:

    LC_CTYPE=C grep -v '[^[:alnum:][:space:][:punct:]]' my_file
    

提交回复
热议问题