Python iterating through two files by line at the same time

让人想犯罪 __ 提交于 2019-12-25 05:17:08

问题


I am trying to compare columns in two files to see if the values match, and if there is a match I want to merge/concatenate the data for that row together. My issue is that when reading line by line from the two files separately, I can't get python to iterate through the files together and look for a match. Instead it will iterate properly through one file and iterate over the same line in the second file multiple times...

I have had this issue in the past and still not really found a way around it. I know that indentation is one problem since I mess with the loop by using "for line in a, for line in b" so I thought that what I tried below would work but it hasn't. I have looked around for solutions but nobody seems to be using the same method so I wonder if I am completely off track as to how to do this? Can anyone explain what is a better way to do this, and whether my method would work at all and if not, why not? Thanks, it is much appreciated!

These are the formats of my two files, basically I want to compare the columns filename in both files and if they match I want to merge the rows together.

file1:
cluster_id  hypothesis_id   filename    M1_name_offset  Orientation
1   71133076    unique_name_1.png   esc_sox2_Sox1_80_4  forward
1   50099120    unique_name_4.png   hb_cebpb_ETS1_139_7 forward
1   91895576    unique_name_11.png  he_tal1_at_AC_acptr_258_11  forward

file2:
Name                Cluster_No  Pattern     filename
esc_sox2_Sox1_80    Cluster1    AP1(1N)ETS      unique_name_4.png
hb_cebpb_ETS1_139   Cluster1    CREB(1N)ETS     unique_name_11.png
he_tal1_at_AC_acptr_258 Cluster2    ETS(-1N)ZIC     unique_name_3.png

What I have tried:

for aline in file1:
    motif1 = aline.split()[2]
    for bline in file2:
        motif2 = bline.split()[-1]
            if motif1 = motif2:
                print "match", aline, bline

I have also tried:

for aline in file1:
    motif1 = aline.split()[2]
for bline in file2:
    motif2 = bline.split()[-1]
        if motif1 = motif2:
            print "match", aline, bline

I have also tried using string formatting but that didn't make a difference. The first way iterates through file2 incorrectly and the second way doesn't give me any output. I have played around with it a lot and tried various indentations and extra bits but I am stumped as to how to even try and fix it! Please help me :(


回答1:


Use the zip builtin function.

with open(file1) as f1, open(file2) as f2:
    for line1, line2 in zip(f1, f2):
        motif1 = line1.split()[0]
        motif2 = line2.split()[0]
        ...

Note that zip behaves differently in python2 and python3. In python2, it would be more efficient to use itertools.izip instead.




回答2:


I'm assuming you're using Python 3. Here's a nice abstraction, iterlines. It hides the complexity of opening, reading, pairing, and closing n files. Note the use of zip_longest, this prevents the ends of longer files being silently discarded.

def iterlines(*paths, fillvalue=None, **open_kwargs):
    files = []
    try:
        for path in paths:
            files.append(open(path, **open_kwargs))
        for lines in zip_longest(*files, fillvalue=fillvalue):
            yield lines
    finally:
        for file_ in files:
            with suppress():
                file_.close()

Usage

for line_a, line_b in iterlines('a.txt', 'b.txt'):
    print(line_a, line_b)

Complete code

from contextlib import suppress
from itertools import zip_longest


def iterlines(*paths, fillvalue=None, **open_kwargs):
    files = []
    try:
        for path in paths:
            files.append(open(path, **open_kwargs))
        for lines in zip_longest(*files, fillvalue=fillvalue):
            yield lines
    finally:
        for file_ in files:
            with suppress():
                file_.close()


for lines in iterlines('a.txt', 'b.txt', 'd.txt'):
    print(lines)


来源:https://stackoverflow.com/questions/24960996/python-iterating-through-two-files-by-line-at-the-same-time

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!