I used NLTK\'s ne_chunk
to extract named entities from a text:
my_sent = \"WASHINGTON -- In the wake of a string of abuses by New York police of
A Tree
is a list. Chunks are subtrees, non-chunked words are regular strings. So let's go down the list, extract the words from each chunk, and join them.
>>> chunked = nltk.ne_chunk(my_sent)
>>>
>>> [ " ".join(w for w, t in elt) for elt in chunked if isinstance(elt, nltk.Tree) ]
['WASHINGTON', 'New York', 'Loretta E. Lynch', 'Brooklyn']