I am currently working on a project to create a word prediction model. There are 800,000 datasets, but 0.5% is used separately as a prototype, and the training data size is as f