I have a space-separated file which lists tokens, lemmas and their corresponding parts of speech. The file is sorted alphabetically so that we can be sure that all non-unique to