Python regex: tokenizing English contractions

前端 未结 5 1513
天涯浪人
天涯浪人 2021-01-20 21:59

I am trying to parse strings in such a way as to separate out all word components, even those that have been contracted. For example the tokenization of \"shouldn\'t\" wou

5条回答
  •  感情败类
    2021-01-20 22:30

    (?

    EDIT: \2 is the match, \3 is the first group, \4 the second and \5 the third.

提交回复
热议问题