Extracting info from large structured text files

后端 未结 5 1208
鱼传尺愫
鱼传尺愫 2021-01-15 18:22

I need to read some large files (from 50k to 100k lines), structured in groups separated by empty lines. Each group start at the same pattern \"No.999999999 dd/mm/yyyy ZZZ

5条回答
  •  梦毁少年i
    2021-01-15 19:22

    I wouldn't use regex here. If you know that your lines will be starting with fixed strings, why not check those strings and write a logic around it?

    for line in open(file):
        if line[0:3]=='No.':
            currIndex='No'
            map['No']=line[4:]
       ....
       ...
       else if line.strip()=='':
           //store the record in the map and clear the map
       else:
          //append line to the last index in map.. this is when the record overflows to the next line.
          Map[currIndex]=Map[currIndex]+"\n"+line 
    

    Consider the above code as just the pseudocode.

提交回复
热议问题