I\'d like to parse a very large (about 200MB) RDF file in python. Should I be using sax or some other library? I\'d appreciate some very basic code that I can build on, say to r
A very fast library to parse RDF files is LightRdf. It could be installed via pip. Code examples can be found on the project page.
If you want to parse triples from a gzipped RDF file, you can do this like that:
import lightrdf
import gzip
RDF_FILENAME = 'data.rdf.gz'
f = gzip.open(RDF_FILENAME, 'rb')
doc = lightrdf.RDFDocument(f, parser=lightrdf.xml.PatternParser)
for (s, p, o) in doc.search_triples(None, None, None)):
print(s, p, o)