Python regex: greedy pattern returning multiple empty matches

前端 未结 1 553
一生所求
一生所求 2020-12-21 05:18

This pattern is meant simply to grab everything in a string up until the first potential sentence boundary in the data:

[^\\.?!\\r\\n]*

Out

相关标签:
1条回答
  • 2020-12-21 06:04

    The * quantifier allows the pattern to capture a substring of length zero. In your original code version (without the ^ anchor in front), the additional matches are:

    • the zero-length string between the end of hard and the first !
    • the zero-length string between the first and second !
    • the zero-length string between the second and third !
    • the zero-length string between the third ! and the end of the text

    You can slice/dice this further if you like here.

    Adding that ^ anchor to the front now ensures that only a single substring can match the pattern, since the beginning of the input text occurs exactly once.

    0 讨论(0)
提交回复
热议问题