I have a TSV file, that looks like this:
A B C D D=1;E=2
S D F G H=2;B=4
I\'d like to write the contents to another tsv file in
with open('path/to/input') as infile, open('path/to/output', 'w') as outfile:
writer = csv.writer(outfile, delimiter='\t')
for line in csv.reader(infile, delimiter='\t'):
vals = line[-1]
headers = line[:-1]
for val in vals.split(';'):
writer.writeline(headers + [val])
If you are positively sure you only have tabs and semicolons, then you can use split.
with open('/tmp/test.tsv') as infile, open('/tmp/test2.tsv', 'w') as outfile:
for line in infile:
tsplit = line.split("\t")
firstcolumns = tsplit[:-1]
lastitems = tsplit[-1].strip().split(";")
for item in lastitems:
allcolumns = firstcolumns + item.split("=")
outfile.write("\t".join(allcolumns) + "\n")
(Updated to make it easier to compare with the other answer.)
This will work regardless of the number of semicolon-separated items you have in the last column. However, this is sensitive to small changes in the format (e.g. added spaces).