Remove lines that contain non-english (Ascii) characters from a file

前端 未结 4 2114
我在风中等你
我在风中等你 2021-02-13 02:48

I have a text file with characters from different languages like (chinese, latin etc)

I want to remove all lines that contain these non-English characters. I want to inc

4条回答
  •  长情又很酷
    2021-02-13 03:23

    You can use egrep -v to return only lines not matching the pattern and use something like [^ a-zA-Z0-9.,;:-'"?!] as pattern (include more punctuation as needed).

    Hm, thinking about it, a double negation (-v and the inverted character class) is probably not that good. Another way might be ^[ a-zA-Z0-9.,;:-'"?!]*$.

    You can also just filter for ASCII:

    egrep -v "[^ -~]" foo.txt
    

提交回复
热议问题