I am trying to take a tab delimited file with two columns, Name and Age, which reads in as this:
'Name\tAge\nMark\t32\nMatt\t29\nJohn\t67\nJason\t45\nMatt\t12\nFrank\t11\nFrank\t34\nFrank\t65\nFrank\t78\n'
And simply create two lists, one with names (called names, without heading) and one with the ages (called ages, but without ages in the list).
Using the csv module, you might do something like this:
import csv
names=[]
ages=[]
with open('data.csv','r') as f:
next(f) # skip headings
reader=csv.reader(f,delimiter='\t')
for name,age in reader:
names.append(name)
ages.append(age)
print(names)
# ('Mark', 'Matt', 'John', 'Jason', 'Matt', 'Frank', 'Frank', 'Frank', 'Frank')
print(ages)
# ('32', '29', '67', '45', '12', '11', '34', '65', '78')
tab delimited data is within the domain of the csv
module:
>>> corpus = 'Name\tAge\nMark\t32\nMatt\t29\nJohn\t67\nJason\t45\nMatt\t12\nFrank\t11\nFrank\t34\nFrank\t65\nFrank\t78\n'
>>> import StringIO
>>> infile = StringIO.StringIO(corpus)
pretend infile
was just a regular file
...
>>> import csv
>>> r = csv.DictReader(infile,
... dialect=csv.Sniffer().sniff(infile.read(1000)))
>>> infile.seek(0)
you don't even have to tell the csv module about the headings and the delimiter format, it'll figure it out on its own
>>> names, ages = [],[]
>>> for row in r:
... names.append(row['Name'])
... ages.append(row['Age'])
...
>>> names
['Mark', 'Matt', 'John', 'Jason', 'Matt', 'Frank', 'Frank', 'Frank', 'Frank']
>>> ages
['32', '29', '67', '45', '12', '11', '34', '65', '78']
>>>
I would use the split
and splitlines
methods of strings:
names = []
ages = []
for name_age in input.splitlines():
name, age = name_age.strip().split("\t")
names.append(name)
ages.append(age)
If you were parsing a more complex format, I would suggest using the csv module, which can also handle tsv… But it seems like it would be a bit overkill here.
Unutbu's answer compressed using a list comprehension:
names = [x[0] for x in csv.reader(open(filename,'r'),delimiter='\t')]
ages = [x[1] for x in csv.reader(open(filename,'r'),delimiter='\t')]
marvin's answer but without reading the entire file twice
data = [ (x[0],x[1]) for x in csv.reader(open(filename,'r'),delimiter='\t')]
If you are ok with it being of tuples, instead of two lists
you could still read data into two lists in a single pass and that would be unubtu's answer
来源:https://stackoverflow.com/questions/7605374/parsing-a-tab-delimited-file-into-separate-lists-or-strings