I have a file in .ttl
form. It has 4 attributes/columns containing quadruples of the following form:
(id, student_name, student_address, student
You can do as Snakes and Coffee suggests, only wrap that function (or its code) in a loop with yield statements. This creates a generator, which can be called iteratively to create the next line's dicts on the fly. Assuming you were going to write these to a csv, for instance, using Snakes' parse_to_dict:
import re
import csv
writer = csv.DictWriter(open(outfile, "wb"), fieldnames=["id", "name", "address", "phone"])
# or whatever
You can create a generator as a function or with an inline comprehension:
def dict_generator(lines):
for line in lines:
yield parse_to_dict(line)
--or--
dict_generator = (parse_to_dict(line) for line in lines)
These are pretty much equivalent. At this point you can get a dict-parsed line by calling dict_generator.next()
, and you'll magically get one at a time- no additional RAM thrashing involved.
If you have 16 gigs of raw data, you might consider making a generator to pull the lines in, too. They're really useful.
More info on generators from SO and some docs: What can you use Python generator functions for? http://wiki.python.org/moin/Generators