I\'d like to parse a very large (about 200MB) RDF file in python. Should I be using sax or some other library? I\'d appreciate some very basic code that I can build on, say to r
I second the suggestion that you try out rdflib. It's nice and quick prototyping, and the BerkeleyDB backend store scales pretty well into the millions of triples if you don't want to load the whole graph into memory.
import rdflib
graph = rdflib.Graph("Sleepycat")
graph.open("store", create=True)
graph.parse("big.rdf")
# print out all the triples in the graph
for subject, predicate, object in graph:
print subject, predicate, object