问题
I am trying to use Python in order to manipulate a text file from Format A:
Key1
Key1value1
Key1value2
Key1value3
Key2
Key2value1
Key2value2
Key2value3
Key3...
Into Format B:
Key1 Key1value1
Key1 Key1value2
Key1 Key1value3
Key2 Key2value1
Key2 Key2value2
Key2 Key2value3
Key3 Key3value1...
Specifically, here is a brief look at the file itself (only one key shown, thousands more in the full file):
chr22:16287243: PASS
patientID1 G/G
patientID2 G/G
patient ID3 G/G
And the desired output here:
chr22:16287243: PASS patientID1 G/G
chr22:16287243: PASS patientID2 G/G
chr22:16287243: PASS patientID3 G/G
I've written the following code which can detect/display the keys, but I am having trouble writing the code to store the values associated with each key, and subsequently printing these key-value pairs. Can anyone please assist me with this task?
import sys
import re
records=[]
with open('filepath', 'r') as infile:
for line in infile:
variant = re.search("\Achr\d",line, re.I) # all variants start with "chr"
if variant:
records.append(line.replace("\n",""))
#parse lines until a new variant is encountered
for r in records:
print (r)
回答1:
Do it in one pass, without storing the lines:
with open("input") as infile, open("ouptut", "w") as outfile:
for line in infile:
if line.startswith("chr"):
key = line.strip()
else:
print >> outfile, key, line.rstrip("\n")
This code assumes the first line contains a key and will fail otherwise.
回答2:
First, if strings start with a character sequence, don't use regular expressions. Much simpler and easier to read:
if line.startswith("chr")
The next step would be to use a very simple state machine. Like so:
current_key = ""
for line in file:
if line.startswith("chr"):
current_key = line.strip()
else:
print " ".join([current_key, line.strip()])
回答3:
If there are always the same number of values per key, islice is useful:
from itertools import islice
with open('input.txt') as fin, open('output.txt','w') as fout:
for k in fin:
for v in islice(fin,3):
fout.write(' '.join((k.strip(),v)))
来源:https://stackoverflow.com/questions/8247499/use-python-to-manipulate-txt-file-presentation-of-key-value-grouping