I have 200MB of articles taken from a newspaper.
The structure of this corpse of text is the following: