问题
As outlined:in
https://github.com/WolfgangFahl/DgraphAndWeaviateTest/blob/master/storage/sparql.py
and
https://github.com/WolfgangFahl/DgraphAndWeaviateTest/blob/master/tests/testSPARQL.py
I tried to allow for a "round trip" operation between python list of dicts and Jena/SPARQL based storage.
The approach performs very well for my usecase and after trying it out for a while i get into more details that need to be addressed.
The stackoverflow question listOfDict to RDF conversion in python targeting Apache Jena Fuseki addresses the initial issues and https://github.com/WolfgangFahl/DgraphAndWeaviateTest/issues?q=is%3Aissue+is%3Aclosed issues 2-5 show some detail problems that were already fixed.
Now I am working with some 180000 records i'd like to import from 6 different data sources and each data source seems to have new exotic records that make the approach fail.
E.g. one batch of records gives me the following log:
read 45601 events in 0.6 s
storing 45601 events to sparql
batch for 1 - 2000 of 45601 cr:Event in 0.6 s -> 0.6 s
batch for 2001 - 4000 of 45601 cr:Event in 0.5 s -> 1.1 s
batch for 4001 - 6000 of 45601 cr:Event in 0.5 s -> 1.6 s
batch for 6001 - 8000 of 45601 cr:Event in 0.5 s -> 2.1 s
batch for 8001 - 10000 of 45601 cr:Event in 0.5 s -> 2.6 s
batch for 10001 - 12000 of 45601 cr:Event in 0.7 s -> 3.2 s
======================================================================
ERROR: testCrossref (tests.test_Crossref.TestCrossref)
test loading crossref data
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/wf/Library/Python/3.8/lib/python/site-packages/SPARQLWrapper/Wrapper.py", line 1073, in _query
response = urlopener(request)
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 531, in open
response = meth(req, response)
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 640, in http_response
response = self.parent.error(
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 569, in error
return self._call_chain(*args)
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 502, in _call_chain
result = func(*args)
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 400: Bad Request
SPARQLWrapper.SPARQLExceptions.QueryBadFormed: QueryBadFormed: a bad request has been sent to the endpoint, probably the sparql query is bad formed.
Response:
b'Error 400: Bad Request\n'
Now since I don't get any details on what the problem is i am working with a binary search. With the error above i only know the problem is with a record with a batchIndex between 12000 and 14000 so I am . setting the limit to 14000 and batchSize to 100 to get closer.
batch for 13301 - 13400 of 14000 cr:Event in 0.0 s -> 4.3 s
is now the last successful batch. So i am using a binary search: 13450 fail, 13425 fail, 13412 ok, 13418 ok, 13422 fail, 13420 ok, 13421 ok So record 13422 is the culprit and I switch on debug mode to see the INSERT Data created for the record:
cr:Event__102140gtm20003 cr:Event_name "Higher local fields".
cr:Event__102140gtm20003 cr:Event_location "M\\"unster, Germany".
cr:Event__102140gtm20003 cr:Event_source "crossref".
cr:Event__102140gtm20003 cr:Event_eventId "10.2140/gtm.2000.3".
cr:Event__102140gtm20003 cr:Event_title "Invitation to higher local fields".
cr:Event__102140gtm20003 cr:Event_startDate "1999-08-29"^^<http://www.w3.org/2001/XMLSchema#date>.
cr:Event__102140gtm20003 cr:Event_year 1999.
cr:Event__102140gtm20003 cr:Event_month 9.
cr:Event__102140gtm20003 cr:Event_endDate "1999-09-05"^^<http://www.w3.org/2001/XMLSchema#date>.
So the Umlaut-encoding "\u" in the location "Münster" is the culprit here. I will work around this issue. The real question is:
How can i get the Fuseki API via SPARQLWrapper to properly report a detailed error message*
e.g. with something like
error in line # cr:Event__102140gtm20003 cr:Event_location "M\\"unster, Germany". is not a valid triple?
来源:https://stackoverflow.com/questions/63486767/how-can-i-get-the-fuseki-api-via-sparqlwrapper-to-properly-report-a-detailed-err