Extract specific text lines?

前端未结

关注

 10  1111

庸人自扰 2021-02-03 15:09

I have a large several hudred thousand lines text file. I have to extract 30,000 specific lines that are all in the text file in random spots. This is the program I have to extr

10条回答

说谎 (楼主)

2021-02-03 15:23
This method assumes the special values appear in the same position on the line in gbigfile
```
def mydict(iterable):
    d = {}
    for k, v in iterable:
        if k in d:
            d[k].append(v)
        else:
            d[k] = [v]
    return d

with open("C:\\to_find.txt", "r") as t:
    tofind = mydict([(x[0], x) for x in t.readlines()])

with open("C:\\gbigfile.txt", "r") as bigfile:
    with open("C:\\outfile.txt", "w") as outfile:
        for line in bigfile:
            seq = line[4:9]
            if seq in tofind[seq[0]]:
                outfile.write(line)
```
Depending on what the distribution of the starting letter in those targets you can cut your comparisons down by a significant amount. If you don't know where the values will appear you're talking about a LONG operation because you'll have to compare hundreds of thousands - let's say 300,000 -- 30,000 times. That's 9 million comparisons which is going to take a long time.
0 讨论(0)

查看其它10个回答
发布评论:

提交评论
- 加载中...