I have sentences in corpus with mixed words (dictionary and non-dictionary words). Non-dictionary words are as important as they are domain specific. I\'m not performing any nlp