I would like to know if there is a standard or generally accepted way of representing an equivalent of NULL used in databases for RDF data.
More specifically, I\'m inter
I don't know of a standard way of doing this, but one of the advantages of working in RDF is that you have a lot of flexibility in how you decide to do this. RDF, per se, cannot express negation (i.e., there is no incredibly convenient way to say that a triple s p o does not hold), but OWL can. As to the four cases you descibed, here are some approaches that you might make:
1. The value is not applicable, i.e. property p does not exist or does not make sense in the context.
If it does not make much sense for a property p to be have a value for a subject s, then it's probably acceptable to just not write any triples of the of the form s p o. Since RDF makes an open world assumption, it is often the case that, in data retrieval, one only queries for the data that one is interested in, and does not make too much of an effort to check where there are unexpected things. If you do want to do some sanity checking, then you can declare RDFS domains and ranges for properties. For instance, you might have:
hasBirthDate rdfs:domain AnimateObject .
hasConstructionDate rdfs:domain InanimateObject .
According to the semantics, if you then have
object82 hasBirthDate "2013-04-01" ;
hasConstructionDate "2013-04-02" .
then you'll also infer that
object82 a AnimateObject, a InanimateObject .
and you might run a sanity check that looks for things that are both AnimateObject
s and InanimateObject
s. If anything is both, you probably have a problem that you should look into. If you use OWL, then you can actually declare that the AnimateObject
and InanimateObject
are disjoint and check for logical consistency. Alternatively, in OWL, you can add assertions such as
object82 hasConstructionDate max 0
which says that object82
should have no values for the property hasConstructionDate
.
In any case, add rdfs:comment
s to your properties explaining what the property should be used for and what it should not be used for. When appropriate, add rdfs:comment
s to individuals to explain why they should not have a value for a given property, if they should not have such a value.
2. The value is unknown, i.e., it should be there but we don't know it.
In this case, it is important to pin down what exactly “should” means. In OWL, for instance, you can say that
Person SubClassof (hasName min 1 String)
to assert that every person
is related to at least one String
by the property hasName
; that is, every person has at least one name. That is one way of saying that there is some value, but we might not know what it is in a particular case. If you cannot work with OWL, but only with RDF, then you should probably add an rdfs:comment
to the property hasName
along the lines of “each NamedEntity
should have at least one value for this property.”
3. The value doesn't exist, i.e., the property doesn't have a value (e.g. year of death for a person alive).
This is an interesting case, because RDF has no built in notion of time (in the sense that some triple holds until a given time, and after which time some other triple holds). If you are simply using an RDF graph as a database-like store that you can update (both by removing and inserting new triples), you could probably use some special reserved value for “I'm not dead yet!”. Having an open ended data model, as we do in RDF, makes it particularly easy to do something like this, because you really can just use some new value for it:
mp:JohnCleese hasDeathDate mp:notDeadYet .
mp:GrahamChapman hasDeathDate "1989-10-04" .
Of course, you can also be a bit more refined and use a boolean-valued property to indicate whether or not a value for the first property makes sense:
mp:JohnCleese isDeceased "false" .
mp:GrahamChapman isDeceased "true" ;
hasDeathDate "1989-10-04" .
4. The value is withheld, e.g., when the data consumer is not allowed to access it.
This, in my opinion, is the most interesting case, because it potentially involves the most interesting data transformation. If you have a nice dataset that people can query, and you want to indicate something about the results that they would obtain except for their lack of permission, you have lots of options in representing this. For instance, you could use something like HTTP status codes to replace nodes in the graph with blank nodes acting like redaction. For instance, you might have the data:
ex:JohnDoe hasSSN "000-00-0000" .
ex:JaneDoe hasSSN "000-00-0001" .
When someone asks for the data, you might respond (supposing that the first value is valid, and the second one invalid):
ex:JohnDoe hasSSN [ a ex:ValidSSN ] .
ex:JaneDoe hasSSN [ a ex:InvalidSSN ] .
In general, you could present a different view of the data to consumers than what you actually possess. I do not know of any standards for doing this sort of thing. You might be interested in the, somewhat related, recent W3C recommendation, PROV-O: The PROV Ontology, a vocabulary for describing the provenance of information (e.g., what it was generated from, to what is it attributed); it could be useful in describing the sorts of resources that might not, in their full form, be available to requesters.