Parsing a tab delimited file into separate lists or strings

I am trying to take a tab delimited file with two columns, Name and Age, which reads in as this:

'Name\tAge\nMark\t32\nMatt\t29\nJohn\t67\nJason\t45\nMatt\t12\nFrank\t11\nFrank\t34\nFrank\t65\nFrank\t78\n'

And simply create two lists, one with names (called names, without heading) and one with the ages (called ages, but without ages in the list).

Using the csv module, you might do something like this:

import csv

names=[]
ages=[]
with open('data.csv','r') as f:
    next(f) # skip headings
    reader=csv.reader(f,delimiter='\t')
    for name,age in reader:
        names.append(name)
        ages.append(age) 

print(names)
# ('Mark', 'Matt', 'John', 'Jason', 'Matt', 'Frank', 'Frank', 'Frank', 'Frank')
print(ages)
# ('32', '29', '67', '45', '12', '11', '34', '65', '78')

tab delimited data is within the domain of the csv module:

>>> corpus = 'Name\tAge\nMark\t32\nMatt\t29\nJohn\t67\nJason\t45\nMatt\t12\nFrank\t11\nFrank\t34\nFrank\t65\nFrank\t78\n'
>>> import StringIO
>>> infile = StringIO.StringIO(corpus)

pretend infile was just a regular file...

>>> import csv
>>> r = csv.DictReader(infile, 
...                    dialect=csv.Sniffer().sniff(infile.read(1000)))
>>> infile.seek(0)

you don't even have to tell the csv module about the headings and the delimiter format, it'll figure it out on its own

>>> names, ages = [],[]
>>> for row in r:
...     names.append(row['Name'])
...     ages.append(row['Age'])
... 
>>> names
['Mark', 'Matt', 'John', 'Jason', 'Matt', 'Frank', 'Frank', 'Frank', 'Frank']
>>> ages
['32', '29', '67', '45', '12', '11', '34', '65', '78']
>>>

I would use the split and splitlines methods of strings:

names = []
ages = []
for name_age in input.splitlines():
    name, age = name_age.strip().split("\t")
    names.append(name)
    ages.append(age)

If you were parsing a more complex format, I would suggest using the csv module, which can also handle tsv… But it seems like it would be a bit overkill here.

Unutbu's answer compressed using a list comprehension:

names = [x[0] for x in csv.reader(open(filename,'r'),delimiter='\t')]
ages = [x[1] for x in csv.reader(open(filename,'r'),delimiter='\t')]

marvin's answer but without reading the entire file twice

data = [ (x[0],x[1]) for x in csv.reader(open(filename,'r'),delimiter='\t')]

If you are ok with it being of tuples, instead of two lists

you could still read data into two lists in a single pass and that would be unubtu's answer

来源：https://stackoverflow.com/questions/7605374/parsing-a-tab-delimited-file-into-separate-lists-or-strings

标签

python

parsing

tabs

delimited