tokenize a string keeping delimiters in Python
Is there any equivalent to str.split in Python that also returns the delimiters? I need to preserve the whitespace layout for my output after processing some of the tokens. Example: >>> s="\tthis is an example" >>> print s.split() ['this', 'is', 'an', 'example'] >>> print what_I_want(s) ['\t', 'this', ' ', 'is', ' ', 'an', ' ', 'example'] Thanks! How about import re splitter = re.compile(r'(\s+|\S+)') splitter.findall(s) >>> re.compile(r'(\s+)').split("\tthis is an example") ['', '\t', 'this', ' ', 'is', ' ', 'an', ' ', 'example'] the re module provides this functionality: >>> import re >>> re