I'm struggling with the execution of a SPARQL query in Jena, with a resulting behaviour that I don't understand...
I'm trying to query the Esco ontology (https://ec.europa.eu/esco/download), and I'm using TDB to load the ontology and create the model (sorry if the terms I use are not accurate, I'm not very experienced).
My goal is to find a job position uri in the ontology that matches with the text I have previously extracted: ex: extracted term : "acuponcteur" -> label in ontology: "Acuponcteur"@fr -> uri: <http://ec.europa.eu/esco/occupation/14918>
What I call the "weird behaviour" is related to the results I'm getting (or not) when excuting queries, ie.:
When executing the following query :
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX esco: <http://ec.europa.eu/esco/model#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT ?position
WHERE {
?s rdf:type esco:Occupation.
{ ?position skos:prefLabel ?label. }
UNION
{ ?position skos:altLabel ?label. }
FILTER (lcase(?label)= \"acuponcteur\"@fr )
}
LIMIT 10
I get those results after 1 minute :
-----------------------------------------------
| position |
===============================================
| <http://ec.europa.eu/esco/occupation/14918> |
| <http://ec.europa.eu/esco/occupation/14918> |
| <http://ec.europa.eu/esco/occupation/14918> |
| <http://ec.europa.eu/esco/occupation/14918> |
| <http://ec.europa.eu/esco/occupation/14918> |
| <http://ec.europa.eu/esco/occupation/14918> |
| <http://ec.europa.eu/esco/occupation/14918> |
| <http://ec.europa.eu/esco/occupation/14918> |
| <http://ec.europa.eu/esco/occupation/14918> |
| <http://ec.europa.eu/esco/occupation/14918> |
-----------------------------------------------
However, when I'm trying to add the DISTINCT keyword, thus :
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX esco: <http://ec.europa.eu/esco/model#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT DISTINCT ?position
WHERE {
?s rdf:type esco:Occupation.
{ ?position skos:prefLabel ?label. }
UNION
{ ?position skos:altLabel ?label. }
FILTER (lcase(?label)= \"acuponcteur\"@fr )
}
LIMIT 10
it seems like the query keeps running forever (i stopped the execution after 20 minutes waiting...)
I get the same behaviour when executing the same query as the first one (thus without DISTINCT), with another label to match, a label that I'm sure is not in the ontology. While expecting empty result, it (seems like it) keeps running and i have to kill it after a while (once again, i waited 20 minutes to the most) :
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX esco: <http://ec.europa.eu/esco/model#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT ?position
WHERE {
?s rdf:type esco:Occupation.
{ ?position skos:prefLabel ?label. }
UNION
{ ?position skos:altLabel ?label. }
FILTER (lcase(?label)= \"assistante scolaire\"@fr )
}
LIMIT 10
May it be a problem in the code I'm running? There it is:
public static void main(String[] args) {
// Make a TDB-backed dataset
String directory = "data/testtdb" ;
Dataset dataset = TDBFactory.createDataset(directory) ;
// transaction (protects a TDB dataset against data corruption, unexpected process termination and system crashes)
dataset.begin( ReadWrite.WRITE );
// assume we want the default model, or we could get a named model here
Model model = dataset.getDefaultModel();
try {
// read the input file - only needs to be done once
String source = "data/esco.rdf";
FileManager.get().readModel(model, source, "RDF/XML-ABBREV");
// run a query
String queryString =
"PREFIX skos: <http://www.w3.org/2004/02/skos/core#> " +
"PREFIX esco: <http://ec.europa.eu/esco/model#> " +
"PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> " +
"SELECT ?position " +
"WHERE { " +
" ?s rdf:type esco:Occupation. " +
" { ?position skos:prefLabel ?label. } " +
" UNION " +
" { ?position skos:altLabel ?label. }" +
" FILTER (lcase(?label)= \"acuponcteur\"@fr ) " +
"}" +
"LIMIT 1 " ;
Query query = QueryFactory.create(queryString) ;
// execute the query
QueryExecution qexec = QueryExecutionFactory.create(query, model) ;
try {
ResultSet results = qexec.execSelect() ;
// taken from apache Jena tutorial
ResultSetFormatter.out(System.out, results, query) ;
} finally {
qexec.close() ;
}
} finally {
model.close() ;
dataset.end();
}
}
What am I doing wrong here? Any idea?
Thanks!
As a first point that may or may not make much difference, you can use a property path to simplify
{ ?position skos:prefLabel ?label. }
UNION
{ ?position skos:altLabel ?label. }
as
?position skos:prefLabel|skos:altLabel ?label
This makes the query:
SELECT ?position
WHERE {
?s rdf:type esco:Occupation. # (1)
?position skos:prefLabel|skos:altLabel ?label # (2)
FILTER (lcase(?label)="acuponcteur"@fr )
}
What's the point of ?s in this query? There are some number n of ?position/?label pairs that match (2), and some number m values of ?s that match (1). The number of results that you get from the query is m×n, but you never use the value of ?s. It looks like you used DISTINCT to get rid of some repeated values, but you didn't look to see why you were getting repeated values in the first place. You should simply remove the useless line (1), and have the query:
SELECT DISTINCT ?position
WHERE {
?position skos:prefLabel|skos:altLabel ?label
FILTER (lcase(?label)="acuponcteur"@fr )
}
I wouldn't be surprised if, at the point, you don't even need the DISTINCT anymore.
来源:https://stackoverflow.com/questions/25304721/sparql-query-running-forever