How to compare simple and typed literals in dotnetrdf?

岁酱吖の 提交于 2019-12-12 11:25:01

问题


I'm comparing two graphs, one from a Turtle file with simple literal objects, the other from a file with explicit datatype IRIs. The graphs are otherwise equal.

Graph A:

<s> <p> "o"

Graph B:

<s> <p> "o"^^xsd:string

According to RDF 1.1 (3.3 Literals), "[s]imple literals are syntactic sugar for abstract syntax literals with the datatype IRI http://www.w3.org/2001/XMLSchema#string". This is reflected in the concrete syntax specifications as well (N-Triples, Turtle, RDF XML).

So I'd expect both my graphs to consists of a single triple with a URI node s subject, a URI node p predicate, and a literal node o with type xsd:string object. Based on this I'd expect there to be no difference between the two.

However this is not the case in practice:

var graphStringA = "<http://example.com/subject> <http://example.com/predicate> \"object\".";
var graphStringB = "<http://example.com/subject> <http://example.com/predicate> \"object\"^^<http://www.w3.org/2001/XMLSchema#string>.";

var graphA = new Graph();
var graphB = new Graph();

StringParser.Parse(graphA, graphStringA);
StringParser.Parse(graphB, graphStringB);

var diff = graphA.Difference(graphB);

There's one added and one removed triple in the difference report. The graphs are different, because the datatypes for the object nodes are different: graphA.Triples.First().Object.Datatype is nothing, while graphB.Triples.First().Object.Datatype is the correct URI.


It appears to me that to modify this behaviour I'd have to either

  • go all the way down to LiteralNode (and change its assumptions about literal nodes), or
  • create a new GraphDiff (that takes the default datatype of string literals into account).

A workaround is to remove the "default" datatypes:

private static void RemoveDefaultDatatype(IGraph g)
{
    var triplesWithDefaultDatatype =
        from triple in g.Triples
        where triple.Object is ILiteralNode
        let literal = triple.Object as ILiteralNode
        where literal.DataType != null
        where literal.DataType.AbsoluteUri == "http://www.w3.org/2001/XMLSchema#string" || literal.DataType.AbsoluteUri == "http://www.w3.org/2001/XMLSchema#langString"
        select triple;

    var triplesWithNoDatatype =
        from triple in triplesWithDefaultDatatype
        let literal = triple.Object as ILiteralNode
        select new Triple(
            triple.Subject,
            triple.Predicate,
            g.CreateLiteralNode(
                literal.Value,
                literal.Language));

    g.Assert(triplesWithNoDatatype.ToArray());
    g.Retract(triplesWithDefaultDatatype);
}

Is there a way in dotnetrdf to compare simple literals to typed literals in a way that's consistent with RDF 1.1, without resorting to major rewrite or workaround as above?


回答1:


dotNetRDF is not RDF 1.1 compliant nor do we claim to be. There is a branch which is rewritten to be compliant but it is not remotely production ready.

Assuming that you control the parsing process you can customise the handling of incoming data using the RDF Handlers API. You can then strip the implicit xsd:string type off literals as they come into the system by overriding the HandleTriple(Triple t) method as desired.




回答2:


Using the Handlers API as per RobV's answer:

class StripStringHandler : BaseRdfHandler, IWrappingRdfHandler
{
    protected override bool HandleTripleInternal(Triple t)
    {
        if (t.Object is ILiteralNode)
        {
            var literal = t.Object as ILiteralNode;

            if (literal.DataType != null && (literal.DataType.AbsoluteUri == "http://www.w3.org/2001/XMLSchema#string" || literal.DataType.AbsoluteUri == "http://www.w3.org/2001/XMLSchema#langString"))
            {
                var simpleLiteral = this.CreateLiteralNode(literal.Value, literal.Language);

                t = new Triple(t.Subject, t.Predicate, simpleLiteral);
            }
        }

        return this.handler.HandleTriple(t);
    }

    private IRdfHandler handler;

    public StripStringHandler(IRdfHandler handler) : base(handler)
    {
        this.handler = handler;
    }

    public IEnumerable<IRdfHandler> InnerHandlers
    {
        get
        {
            return this.handler.AsEnumerable();
        }
    }

    protected override void StartRdfInternal()
    {
        this.handler.StartRdf();
    }

    protected override void EndRdfInternal(bool ok)
    {
        this.handler.EndRdf(ok);
    }

    protected override bool HandleBaseUriInternal(Uri baseUri)
    {
        return this.handler.HandleBaseUri(baseUri);
    }

    protected override bool HandleNamespaceInternal(string prefix, Uri namespaceUri)
    {
        return this.handler.HandleNamespace(prefix, namespaceUri);
    }

    public override bool AcceptsAll
    {
        get
        {
            return this.handler.AcceptsAll;
        }
    }
}

Usage:

class Program
{
    static void Main()
    {
        var graphA = Load("<http://example.com/subject> <http://example.com/predicate> \"object\".");
        var graphB = Load("<http://example.com/subject> <http://example.com/predicate> \"object\"^^<http://www.w3.org/2001/XMLSchema#string>.");

        var diff = graphA.Difference(graphB);

        Debug.Assert(diff.AreEqual);
    }

    private static IGraph Load(string source)
    {
        var result = new Graph();
        var graphHandler = new GraphHandler(result);
        var strippingHandler = new StripStringHandler(graphHandler);
        var parser = new TurtleParser();

        using (var reader = new StringReader(source))
        {
            parser.Load(strippingHandler, reader);
        }

        return result;
    }
}


来源:https://stackoverflow.com/questions/40027774/how-to-compare-simple-and-typed-literals-in-dotnetrdf

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!