I have lots of strings containing text in lots of different spellings. I am tokenizing these strings by searching for keywords and if a keyword is found I use an assoicated text
If you have a fixed set of keywords you can use (f)lex, re2c or ragel