I just discovered that if i prefix my grep commands with a LC_ALL=C it does wonders for speeding grep up.
But i am wondering about the implications.
Would a
You don't necessarily need UTF-8 to run into trouble here. The locale is responsible for setting the character classes, i.e. determining which character is a space, a letter or a digit. Consider these two examples:
$ echo -e '\xe4' | LC_ALL=en_US.iso88591 grep '[[:alnum:]]' || echo false
ä
$ echo -e '\xe4' | LC_ALL=C grep '[[:alnum:]]' || echo false
false
When trying to match exact binary patterns against each other, the locale doesn't make a difference, however:
$ echo -e '\xe4' | LC_ALL=en_US.iso88591 grep "$(echo -e '\xe4')" || echo false
ä
$ echo -e '\xe4' | LC_ALL=C grep "$(echo -e '\xe4')" || echo false
ä
I'm not sure about the extent of grep implementing unicode, and how well different codepoints are matched to each other, but matching any subset of ASCII and the matching of single characters without alternate binary representations should work fine regardless of locale.