I have lots of strings containing text in lots of different spellings. I am tokenizing these strings by searching for keywords and if a keyword is found I use an assoicated text
Maybe it's a little overpowered but you should definitely take a look at ANTLR.