I want to use machine learning to apply tokenization. Given a document, I want to generate its tokens. I made a lot of search but I didn\'t find something useful. I am open