I have a txt file which contains 32000 lines. The data is in Arabo-Persian, however, each line contains the Roman transcription of the first word.
دێان diêya