Query dbpedia sparql endpoint using dotnetRDF - RDFParseException

≯℡__Kan透↙ 提交于 2019-12-10 10:03:29

问题


When I execute the following query on http://dbpedia.org/sparql using (dotnetRDF) VDS.RDF.Query.SparqlRemoteEndpoint.QueryWithResultSet() everything works fine.

SELECT ?film ?p ?o
WHERE {
    ?film <http://purl.org/dc/terms/subject> <http://dbpedia.org/resource/Category:Japanese_films> .
    ?film ?p ?o
}
limit 500

But when I try this query using SparqlRemoteEndpoint.QueryWithResultGraph()

CONSTRUCT { ?film ?p ?o}
WHERE {
    ?film <http://purl.org/dc/terms/subject> <http://dbpedia.org/resource/Category:Japanese_films> .
    ?film ?p ?o
}
limit 500

I've got RdfParseException with message

"[Line 456 Column 29] Unexpected Character (Code 8211) – was encountered"

I've tried to set values for ResultsAcceptHeader and RdfAcceptHeader properties but without success.

If in second query I changed limit from 500 for e.g. 100 it works fine.

Could you help me?


Now exception is thrown if limit has value 456. [Line 495 Column 25] Unexpected Character (Code 8211) – was encountered, and this is line 495 ns19:???_???5555 .. Value at column 25 is _

Here you have this data in wiki format http://dbpedia.org/page/Interstella_5555:_The_5tory_of_the_5ecret_5tar_5ystem, as I suppose, there is a problem with value of dbpprop:kanji property (インターステラ5555)


回答1:


DBPedia has known issues with encoding and it may be simply that DBPedia is producing dud data.

What you can try to do to debug this further in dotNetRDF is to wrap the code that invokes the query with the following:

try
{
   Options.HttpDebugging = true;
   Options.HttpFullDebugging = true;

  //Try your query here
}
finally
{
   Options.HttpDebugging = false;
   Options.HttpFullDebugging = false;
}

This will cause parsing to fail (with a different error) but it will dump the raw HTTP response to the console for debugging. If you can edit your question to include the content from the lines around line 456 of the dump then people may be able to provide your with more help.

Edit

So as suspected the problem is indeed with DBPedia producing dud data, not in dotNetRDF itself.

When I downloaded the file you mentioned in Turtle format and tried to parse it I got the same error message and it pertains to the following line:

ns6:Avalon_–_Spiel_um_dein_Leben ,

While at first glance that may look valid (since a simple hyphen - is allowed in Prefixed Names) the problem is that it is not a hyphen it is in fact character code 8211 (of hex 2013 as AndyS mentions) and this is not in the acceptable range of prefix name characters.

Btw I confirmed this with Jena's Turtle parser as well just to make sure it really wasn't a dotNetRDF issue.

So basically the DBPedia data is broken, you can try forcing it to send you back RDF/XML or NTriples by setting the accept headers appropriately but there is no guarantee that the data won't come back bad in those formats as well. I would suggest that you contact the DBPedia guys to report this as a bug - dbpedia-discussion@lists.sf.net




回答2:


Seeing line 456 would be useful. Try making the request with wget (it encodes URLs, curl doesn't, making it easier to use from the command line).

Unicode codepoint 8211 is EN DASH (hex 2013).

LIMIT in CONSTRUCT is the number of row from the graph pattern and not the CONSTRUCT template. You may get more triples that is covered by the SELECT ... LIMIT. Try a larger LIMIT in the SELECT and see if it breaks.



来源:https://stackoverflow.com/questions/13631193/query-dbpedia-sparql-endpoint-using-dotnetrdf-rdfparseexception

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!