I think what I want to do is a fairly common task but I\'ve found no reference on the web. I have text with punctuation, and I want a list of the words.
\"H
join = lambda x: sum(x,[]) # a.k.a. flatten1([[1],[2,3],[4]]) -> [1,2,3,4]
# ...alternatively...
join = lambda lists: [x for l in lists for x in l]
Then this becomes a three-liner:
fragments = [text]
for token in tokens:
fragments = join(f.split(token) for f in fragments)
Explanation
This is what in Haskell is known as the List monad. The idea behind the monad is that once "in the monad" you "stay in the monad" until something takes you out. For example in Haskell, say you map the python range(n) -> [1,2,...,n]
function over a List. If the result is a List, it will be append to the List in-place, so you'd get something like map(range, [3,4,1]) -> [0,1,2,0,1,2,3,0]
. This is known as map-append (or mappend, or maybe something like that). The idea here is that you've got this operation you're applying (splitting on a token), and whenever you do that, you join the result into the list.
You can abstract this into a function and have tokens=string.punctuation
by default.
Advantages of this approach: