listOfDict to RDF conversion in python targeting Apache Jena Fuseki

…衆ロ難τιáo~ 提交于 2021-01-29 17:53:19

问题


To store some data in Apache Jena from python I'd like to have a generic conversion from a list of Dicts to RDF and possibly back on query.

For the list of Dict to RDF part I tried implementing "insertListofDicts" (see below) and tested it with "testListOfDictInsert" (see below). The result is below which leads to a 400: Bad Request when tried with an Apache Jena Fuseki server.

What needs to be fixed for simple string types - and may be for other primitive Python types to get this working?

Please also find the source code at:

  • https://github.com/WolfgangFahl/DgraphAndWeaviateTest/blob/master/dg/jena.py
  • https://github.com/WolfgangFahl/DgraphAndWeaviateTest/blob/master/tests/testJena.py
@prefix foaf: <http://xmlns.com/foaf/0.1/>
INSERT DATA {
foaf:Person/Elizabeth+Alexandra+Mary+Windsor foaf:Person#name "Elizabeth Alexandra Mary Windsor".
foaf:Person/Elizabeth+Alexandra+Mary+Windsor foaf:Person#born "1926-04-21".
foaf:Person/Elizabeth+Alexandra+Mary+Windsor foaf:Person#wikidataurl "https://www.wikidata.org/wiki/Q9682".
foaf:Person/George+of+Cambridge foaf:Person#name "George of Cambridge".
foaf:Person/George+of+Cambridge foaf:Person#born "2013-07-22".
foaf:Person/George+of+Cambridge foaf:Person#wikidataurl "https://www.wikidata.org/wiki/Q1359041".
foaf:Person/Harry+Duke+of+Sussex foaf:Person#name "Harry Duke of Sussex".
foaf:Person/Harry+Duke+of+Sussex foaf:Person#born "1984-09-15".
foaf:Person/Harry+Duke+of+Sussex foaf:Person#wikidataurl "https://www.wikidata.org/wiki/Q152316".

}

testListOfDictInsert

def testListOfDictInsert(self):
        '''
        test inserting a list of Dicts using FOAF example
        https://en.wikipedia.org/wiki/FOAF_(ontology)
        '''
        listofDicts=[
            {'name': 'Elizabeth Alexandra Mary Windsor', 'born': '1926-04-21', 'age': 94, 'ofAge': True , 'wikidataurl': 'https://www.wikidata.org/wiki/Q9682' },
            {'name': 'George of Cambridge',              'born': '2013-07-22', 'age':  7, 'ofAge': False, 'wikidataurl': 'https://www.wikidata.org/wiki/Q1359041'},
            {'name': 'Harry Duke of Sussex',             'born': '1984-09-15', 'age': 36, 'ofAge': True , 'wikidataurl': 'https://www.wikidata.org/wiki/Q152316'}
        ]
        jena=self.getJena(mode='update',debug=True)
        jena.insertListOfDicts(listofDicts,'foaf:Person','name','@prefix foaf: <http://xmlns.com/foaf/0.1/>')

insertListofDicts

def insertListOfDicts(self,listOfDicts,entityType,primaryKey,prefixes):
        '''
        insert the given list of dicts mapping datatypes according to
        https://www.w3.org/TR/xmlschema-2/#built-in-datatypes
        
        mapped from 
        https://docs.python.org/3/library/stdtypes.html
        
        compare to
        https://www.w3.org/2001/sw/rdb2rdf/directGraph/
        http://www.bobdc.com/blog/json2rdf/
        https://www.w3.org/TR/json-ld11-api/#data-round-tripping
        https://stackoverflow.com/questions/29030231/json-to-rdf-xml-file-in-python
        '''
        errors=[]
        insertCommand='%s\nINSERT DATA {\n' % prefixes
        for index,record in enumerate(listOfDicts):
            if not primaryKey in record:
                errors.append["missing primary key %s in record %d",index]
            else:    
                primaryValue=record[primaryKey]
                encodedPrimaryValue=urllib.parse.quote_plus(primaryValue)
                tSubject="%s/%s" %(entityType,encodedPrimaryValue)
                for keyValue in record.items():
                    key,value=keyValue
                    valueType=type(value)
                    if self.debug:
                        print("%s(%s)=%s" % (key,valueType,value))
                    tPredicate="%s#%s" % (entityType,key)
                    tObject=value    
                    if valueType == str:   
                        insertCommand+='  %s %s "%s".\n' % (tSubject,tPredicate,tObject)
        insertCommand+="\n}"
        if self.debug:
            print (insertCommand)
        self.insert(insertCommand)
        return errors

回答1:


+ is the special character in HTTP Form encoding for a space but it should only be used in application/x-www-form-urlencoded.

For URIs, use %20 or decide on a replacement character such as _ for space because it looks a bit like a space.

In all these cases, there is not a space character in the URI - there is a +, %20 (three characters) or _. It is encoding, not an escape mechanism.




回答2:


The following code at least works and has a correct "round-trip" behavior. The data inserted from a list of Dicts can be retrieved with a corresponding quer. Please comment for more improvements or add a better answer.

If you'd always like to get typedLiterals you can specify this now in the constructor of the Jena wrapper class.

in typed literal mode the unit test insert is:

the types

  • integer
  • decimal

are used for numeric literals for proper "round-trip" behavior.

PREFIX foafo: <http://foafo.bitplan.com/foafo/0.1/>
INSERT DATA {
  foafo:Person_ElizabethAlexandraMaryWindsor foafo:Person_name "Elizabeth Alexandra Mary Windsor".
  foafo:Person_ElizabethAlexandraMaryWindsor foafo:Person_born "1926-04-21"^^<http://www.w3.org/2001/XMLSchema#date>.
  foafo:Person_ElizabethAlexandraMaryWindsor foafo:Person_numberInLine "0"^^<http://www.w3.org/2001/XMLSchema#integer>.
  foafo:Person_ElizabethAlexandraMaryWindsor foafo:Person_wikidataurl "https://www.wikidata.org/wiki/Q9682".
  foafo:Person_ElizabethAlexandraMaryWindsor foafo:Person_age "94.32637220476806"^^<http://www.w3.org/2001/XMLSchema#decimal>.
  foafo:Person_ElizabethAlexandraMaryWindsor foafo:Person_ofAge True.
  foafo:Person_CharlesPrinceofWales foafo:Person_name "Charles, Prince of Wales".
  foafo:Person_CharlesPrinceofWales foafo:Person_born "1948-11-14"^^<http://www.w3.org/2001/XMLSchema#date>.
  foafo:Person_CharlesPrinceofWales foafo:Person_numberInLine "1"^^<http://www.w3.org/2001/XMLSchema#integer>.
  foafo:Person_CharlesPrinceofWales foafo:Person_wikidataurl "https://www.wikidata.org/wiki/Q43274".
  foafo:Person_CharlesPrinceofWales foafo:Person_age "71.7578047461618"^^<http://www.w3.org/2001/XMLSchema#decimal>.
  foafo:Person_CharlesPrinceofWales foafo:Person_ofAge True.
  foafo:Person_GeorgeofCambridge foafo:Person_name "George of Cambridge".
  foafo:Person_GeorgeofCambridge foafo:Person_born "2013-07-22"^^<http://www.w3.org/2001/XMLSchema#date>.
  foafo:Person_GeorgeofCambridge foafo:Person_numberInLine "3"^^<http://www.w3.org/2001/XMLSchema#integer>.
  foafo:Person_GeorgeofCambridge foafo:Person_wikidataurl "https://www.wikidata.org/wiki/Q1359041".
  foafo:Person_GeorgeofCambridge foafo:Person_age "7.072013799051315"^^<http://www.w3.org/2001/XMLSchema#decimal>.
  foafo:Person_GeorgeofCambridge foafo:Person_ofAge False.
  foafo:Person_HarryDukeofSussex foafo:Person_name "Harry Duke of Sussex".
  foafo:Person_HarryDukeofSussex foafo:Person_born "1984-09-15"^^<http://www.w3.org/2001/XMLSchema#date>.
  foafo:Person_HarryDukeofSussex foafo:Person_numberInLine "5"^^<http://www.w3.org/2001/XMLSchema#integer>.
  foafo:Person_HarryDukeofSussex foafo:Person_wikidataurl "https://www.wikidata.org/wiki/Q152316".
  foafo:Person_HarryDukeofSussex foafo:Person_age "35.92133993168922"^^<http://www.w3.org/2001/XMLSchema#decimal>.
  foafo:Person_HarryDukeofSussex foafo:Person_ofAge True.
}

when the literal mode is off type literals are only used for dates:

PREFIX foafo: <http://foafo.bitplan.com/foafo/0.1/>
INSERT DATA {
  foafo:Person_ElizabethAlexandraMaryWindsor foafo:Person_name "Elizabeth Alexandra Mary Windsor".
  foafo:Person_ElizabethAlexandraMaryWindsor foafo:Person_born "1926-04-21"^^<http://www.w3.org/2001/XMLSchema#date>.
  foafo:Person_ElizabethAlexandraMaryWindsor foafo:Person_numberInLine 0.
  foafo:Person_ElizabethAlexandraMaryWindsor foafo:Person_wikidataurl "https://www.wikidata.org/wiki/Q9682".
  foafo:Person_ElizabethAlexandraMaryWindsor foafo:Person_age 94.32637220476806.
  foafo:Person_ElizabethAlexandraMaryWindsor foafo:Person_ofAge True.
  foafo:Person_CharlesPrinceofWales foafo:Person_name "Charles, Prince of Wales".
  foafo:Person_CharlesPrinceofWales foafo:Person_born "1948-11-14"^^<http://www.w3.org/2001/XMLSchema#date>.
  foafo:Person_CharlesPrinceofWales foafo:Person_numberInLine 1.
  foafo:Person_CharlesPrinceofWales foafo:Person_wikidataurl "https://www.wikidata.org/wiki/Q43274".
  foafo:Person_CharlesPrinceofWales foafo:Person_age 71.7578047461618.
  foafo:Person_CharlesPrinceofWales foafo:Person_ofAge True.
  foafo:Person_GeorgeofCambridge foafo:Person_name "George of Cambridge".
  foafo:Person_GeorgeofCambridge foafo:Person_born "2013-07-22"^^<http://www.w3.org/2001/XMLSchema#date>.
  foafo:Person_GeorgeofCambridge foafo:Person_numberInLine 3.
  foafo:Person_GeorgeofCambridge foafo:Person_wikidataurl "https://www.wikidata.org/wiki/Q1359041".
  foafo:Person_GeorgeofCambridge foafo:Person_age 7.072013799051315.
  foafo:Person_GeorgeofCambridge foafo:Person_ofAge False.
  foafo:Person_HarryDukeofSussex foafo:Person_name "Harry Duke of Sussex".
  foafo:Person_HarryDukeofSussex foafo:Person_born "1984-09-15"^^<http://www.w3.org/2001/XMLSchema#date>.
  foafo:Person_HarryDukeofSussex foafo:Person_numberInLine 5.
  foafo:Person_HarryDukeofSussex foafo:Person_wikidataurl "https://www.wikidata.org/wiki/Q152316".
  foafo:Person_HarryDukeofSussex foafo:Person_age 35.92133993168922.
  foafo:Person_HarryDukeofSussex foafo:Person_ofAge True.

}

testListOfDictInsert

 def testListOfDictInsert(self):
        '''
        test inserting a list of Dicts and retrieving the values again
        using a person based example
        instead of
        https://en.wikipedia.org/wiki/FOAF_(ontology)
        
        we use an object oriented derivate of FOAF with a focus on datatypes
        '''
        listofDicts=[
            {'name': 'Elizabeth Alexandra Mary Windsor', 'born': self.dob('1926-04-21'), 'numberInLine': 0, 'wikidataurl': 'https://www.wikidata.org/wiki/Q9682' },
            {'name': 'Charles, Prince of Wales',         'born': self.dob('1948-11-14'), 'numberInLine': 1, 'wikidataurl': 'https://www.wikidata.org/wiki/Q43274' },
            {'name': 'George of Cambridge',              'born': self.dob('2013-07-22'), 'numberInLine': 3, 'wikidataurl': 'https://www.wikidata.org/wiki/Q1359041'},
            {'name': 'Harry Duke of Sussex',             'born': self.dob('1984-09-15'), 'numberInLine': 5, 'wikidataurl': 'https://www.wikidata.org/wiki/Q152316'}
        ]
        today=date.today()
        for person in listofDicts:
            born=person['born']
            age=(today - born).days / 365.2425
            person['age']=age
            person['ofAge']=age>=18
        typedLiteralModes=[True,False]
        entityType='foafo:Person'
        primaryKey='name'
        prefixes='PREFIX foafo: <http://foafo.bitplan.com/foafo/0.1/>'
        for typedLiteralMode in typedLiteralModes:
            jena=self.getJena(mode='update',typedLiterals=typedLiteralMode,debug=True)
            errors=jena.insertListOfDicts(listofDicts,entityType,primaryKey,prefixes)
            self.checkErrors(errors)
            
        jena=self.getJena(mode="query")    
        queryString = """
        PREFIX foafo: <http://foafo.bitplan.com/foafo/0.1/>
        SELECT ?name ?born ?numberInLine ?wikidataurl ?ofAge ?age WHERE { 
            ?person foafo:Person_name ?name.
            ?person foafo:Person_born ?born.
            ?person foafo:Person_numberInLine ?numberInLine.
            ?person foafo:Person_wikidataurl ?wikidataurl.
            ?person foafo:Person_ofAge ?ofAge.
            ?person foafo:Person_age ?age. 
        }"""
        personResults=jena.query(queryString)
        self.assertEqual(len(listofDicts),len(personResults))
        personList=jena.asListOfDicts(personResults)   
        for index,person in enumerate(personList):
            print("%d: %s" %(index,person))
        # check the correct round-trip behavior
        self.assertEqual(listofDicts,personList)

insertListOfDicts

def insertListOfDicts(self,listOfDicts,entityType,primaryKey,prefixes):
        '''
        insert the given list of dicts mapping datatypes according to
        https://www.w3.org/TR/xmlschema-2/#built-in-datatypes
        
        mapped from 
        https://docs.python.org/3/library/stdtypes.html
        
        compare to
        https://www.w3.org/2001/sw/rdb2rdf/directGraph/
        http://www.bobdc.com/blog/json2rdf/
        https://www.w3.org/TR/json-ld11-api/#data-round-tripping
        https://stackoverflow.com/questions/29030231/json-to-rdf-xml-file-in-python
        '''
        errors=[]
        insertCommand='%s\nINSERT DATA {\n' % prefixes
        for index,record in enumerate(listOfDicts):
            if not primaryKey in record:
                errors.append["missing primary key %s in record %d",index]
            else:    
                primaryValue=record[primaryKey]
                encodedPrimaryValue=self.getLocalName(primaryValue)
                tSubject="%s_%s" %(entityType,encodedPrimaryValue)
                for keyValue in record.items():
                    key,value=keyValue
                    valueType=type(value)
                    if self.debug:
                        print("%s(%s)=%s" % (key,valueType,value))
                    tPredicate="%s_%s" % (entityType,key)
                    tObject=value    
                    if valueType == str:   
                        tObject='"%s"' % value
                    elif valueType==int:
                        if self.typedLiterals:
                            tObject='"%d"^^<http://www.w3.org/2001/XMLSchema#integer>' %value
                        pass
                    elif valueType==float:
                        if self.typedLiterals:
                            tObject='"%s"^^<http://www.w3.org/2001/XMLSchema#decimal>' %value
                        pass
                    elif valueType==bool:
                        pass
                    elif valueType==datetime.date:
                        #if self.typedLiterals:
                        tObject='"%s"^^<http://www.w3.org/2001/XMLSchema#date>' %value
                        pass
                    else:
                        errors.append("can't handle type %s in record %d" % (valueType,index))
                        tObject=None
                    if tObject is not None:    
                        insertCommand+='  %s %s %s.\n' % (tSubject,tPredicate,tObject)
        insertCommand+="\n}"
        if self.debug:
            print (insertCommand)
        self.insert(insertCommand)
        return errors


来源:https://stackoverflow.com/questions/63435157/listofdict-to-rdf-conversion-in-python-targeting-apache-jena-fuseki

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!