I\'m using scanner with delimiter and I\'ve came across a strange behaviour I\'d like to understand.
I\'m using this programm :
Scanner sc = new
I have a feeling that you are causing two delimiter captures in places where there's a blank space followed by punctuation. Why not simply use [\\s\\p{Punct}]+
?
This regex \\s+|\\p{Punct}+
will first capture the empty space and swallow it, then will capture the next delimiter as the punctuation. That will be two delimiters next to each other with nothing in between (the empty token).
I happened to encounter the empty token problem with the Scanner class too. I think the delimiter pattern has to be made greedy by surrounding it with parenthesis and appending + to the group. The pattern I used looks like this
"((\\s)+|(\\\\r\\\\n)+|\\p{Punct}+)+".