Extract specific text lines?

前端 未结 10 1111
庸人自扰
庸人自扰 2021-02-03 15:09

I have a large several hudred thousand lines text file. I have to extract 30,000 specific lines that are all in the text file in random spots. This is the program I have to extr

10条回答
  •  说谎
    说谎 (楼主)
    2021-02-03 15:23

    This method assumes the special values appear in the same position on the line in gbigfile

    def mydict(iterable):
        d = {}
        for k, v in iterable:
            if k in d:
                d[k].append(v)
            else:
                d[k] = [v]
        return d
    
    with open("C:\\to_find.txt", "r") as t:
        tofind = mydict([(x[0], x) for x in t.readlines()])
    
    with open("C:\\gbigfile.txt", "r") as bigfile:
        with open("C:\\outfile.txt", "w") as outfile:
            for line in bigfile:
                seq = line[4:9]
                if seq in tofind[seq[0]]:
                    outfile.write(line)
    

    Depending on what the distribution of the starting letter in those targets you can cut your comparisons down by a significant amount. If you don't know where the values will appear you're talking about a LONG operation because you'll have to compare hundreds of thousands - let's say 300,000 -- 30,000 times. That's 9 million comparisons which is going to take a long time.

提交回复
热议问题