So, I need to perform word extraction from a python string by matching it against a list of tokens that can have bigrams as well as unigrams, but this can go to higher order