How to parse .ttl files with RDFLib?

前端 未结 3 890
一个人的身影
一个人的身影 2021-02-07 22:14

I have a file in .ttl form. It has 4 attributes/columns containing quadruples of the following form:

  1. (id, student_name, student_address, student
3条回答
  •  滥情空心
    2021-02-07 22:22

    You can do as Snakes and Coffee suggests, only wrap that function (or its code) in a loop with yield statements. This creates a generator, which can be called iteratively to create the next line's dicts on the fly. Assuming you were going to write these to a csv, for instance, using Snakes' parse_to_dict:

    import re
    import csv
    
    writer = csv.DictWriter(open(outfile, "wb"), fieldnames=["id", "name", "address", "phone"])
    # or whatever
    

    You can create a generator as a function or with an inline comprehension:

    def dict_generator(lines): 
        for line in lines: 
            yield parse_to_dict(line)
    

    --or--

    dict_generator = (parse_to_dict(line) for line in lines)
    

    These are pretty much equivalent. At this point you can get a dict-parsed line by calling dict_generator.next(), and you'll magically get one at a time- no additional RAM thrashing involved.

    If you have 16 gigs of raw data, you might consider making a generator to pull the lines in, too. They're really useful.

    More info on generators from SO and some docs: What can you use Python generator functions for? http://wiki.python.org/moin/Generators

提交回复
热议问题