I am trying to parse strings in such a way as to separate out all word components, even those that have been contracted. For example the tokenization of \"shouldn\'t\" wou
(?
EDIT: \2 is the match, \3 is the first group, \4 the second and \5 the third.