So I want to split a string in java on any non-alphanumeric characters.
Currently I have been doing it like this
words= Str.split(\"\\\\W+\");
For basic English characters, use
words = Str.split("[^a-zA-Z0-9']+");
If you want to include English words with special characters (such as fiancé) or for languages that use non-English characters, go with
words = Str.split("[^\\p{L}0-9']+");
words = Str.split("[^\\w']+");
Just add it to the character class. \W
is equivalent to [^\w]
, which you can then add '
to.
Do note, however, that \w
also actually includes underscores. If you want to split on underscores as well, you should be using [^a-zA-Z0-9']
instead.