问题
I am trying to compare columns in two files to see if the values match, and if there is a match I want to merge/concatenate the data for that row together. My issue is that when reading line by line from the two files separately, I can't get python to iterate through the files together and look for a match. Instead it will iterate properly through one file and iterate over the same line in the second file multiple times...
I have had this issue in the past and still not really found a way around it. I know that indentation is one problem since I mess with the loop by using "for line in a, for line in b" so I thought that what I tried below would work but it hasn't. I have looked around for solutions but nobody seems to be using the same method so I wonder if I am completely off track as to how to do this? Can anyone explain what is a better way to do this, and whether my method would work at all and if not, why not? Thanks, it is much appreciated!
These are the formats of my two files, basically I want to compare the columns filename in both files and if they match I want to merge the rows together.
file1:
cluster_id hypothesis_id filename M1_name_offset Orientation
1 71133076 unique_name_1.png esc_sox2_Sox1_80_4 forward
1 50099120 unique_name_4.png hb_cebpb_ETS1_139_7 forward
1 91895576 unique_name_11.png he_tal1_at_AC_acptr_258_11 forward
file2:
Name Cluster_No Pattern filename
esc_sox2_Sox1_80 Cluster1 AP1(1N)ETS unique_name_4.png
hb_cebpb_ETS1_139 Cluster1 CREB(1N)ETS unique_name_11.png
he_tal1_at_AC_acptr_258 Cluster2 ETS(-1N)ZIC unique_name_3.png
What I have tried:
for aline in file1:
motif1 = aline.split()[2]
for bline in file2:
motif2 = bline.split()[-1]
if motif1 = motif2:
print "match", aline, bline
I have also tried:
for aline in file1:
motif1 = aline.split()[2]
for bline in file2:
motif2 = bline.split()[-1]
if motif1 = motif2:
print "match", aline, bline
I have also tried using string formatting but that didn't make a difference. The first way iterates through file2 incorrectly and the second way doesn't give me any output. I have played around with it a lot and tried various indentations and extra bits but I am stumped as to how to even try and fix it! Please help me :(
回答1:
Use the zip builtin function.
with open(file1) as f1, open(file2) as f2:
for line1, line2 in zip(f1, f2):
motif1 = line1.split()[0]
motif2 = line2.split()[0]
...
Note that zip
behaves differently in python2 and python3. In python2, it would be more efficient to use itertools.izip
instead.
回答2:
I'm assuming you're using Python 3. Here's a nice abstraction, iterlines
. It hides the complexity of opening, reading, pairing, and closing n files. Note the use of zip_longest
, this prevents the ends of longer files being silently discarded.
def iterlines(*paths, fillvalue=None, **open_kwargs):
files = []
try:
for path in paths:
files.append(open(path, **open_kwargs))
for lines in zip_longest(*files, fillvalue=fillvalue):
yield lines
finally:
for file_ in files:
with suppress():
file_.close()
Usage
for line_a, line_b in iterlines('a.txt', 'b.txt'):
print(line_a, line_b)
Complete code
from contextlib import suppress
from itertools import zip_longest
def iterlines(*paths, fillvalue=None, **open_kwargs):
files = []
try:
for path in paths:
files.append(open(path, **open_kwargs))
for lines in zip_longest(*files, fillvalue=fillvalue):
yield lines
finally:
for file_ in files:
with suppress():
file_.close()
for lines in iterlines('a.txt', 'b.txt', 'd.txt'):
print(lines)
来源:https://stackoverflow.com/questions/24960996/python-iterating-through-two-files-by-line-at-the-same-time