I have a text file with characters from different languages like (chinese, latin etc)
I want to remove all lines that contain these non-English characters. I want to inc
You can use egrep -v
to return only lines not matching the pattern and use something like [^ a-zA-Z0-9.,;:-'"?!]
as pattern (include more punctuation as needed).
Hm, thinking about it, a double negation (-v
and the inverted character class) is probably not that good. Another way might be ^[ a-zA-Z0-9.,;:-'"?!]*$
.
You can also just filter for ASCII:
egrep -v "[^ -~]" foo.txt