This pattern is meant simply to grab everything in a string up until the first potential sentence boundary in the data:
[^\\.?!\\r\\n]*
Out
The *
quantifier allows the pattern to capture a substring of length zero. In your original code version (without the ^
anchor in front), the additional matches are:
hard
and the first !
!
!
!
and the end of the textYou can slice/dice this further if you like here.
Adding that ^
anchor to the front now ensures that only a single substring can match the pattern, since the beginning of the input text occurs exactly once.