This begs another question - How big?
Give us a clue like # of lines or # or Mb (Gb)?
Then we will know if it fits in memory, needs to be disk based etc.
At first pass I would use my WordList(S: String; AList: TStringlist);
then you can access each token as Alist[n]...
or sort them or whatever.