问题
A question about representing a couple of RDF-triples using tensor.
Scenario:
A RDF-triple is used to express simple statements about resources, formatting (subject, predicate, object).
Suppose I have two predicates, one is play_for, the other is race_for, each of which contains n triples, as follows:
1-st predicate: play_for; n triples: (Ray Allen, play_for, Boston Celtics), (Kobe Bryant, play_for, Lakers), ... For short, (A_i, play for, T_i) for i =1 to n.
2-rd predicate: race_for; n triples: (Boston Celtics, race_for, NBA championship), (Lakers, race_for, NBA championship), ... For short, (T_i, race for, NBA) for i=1 to n.
Tensor representation is one way to modeling this 2n triples. I'm studying Maximilian Nickel's paper to use tensor factorization to find the latent semantic structure of a dataset. And the first step is to represent the dataset using tensor.
A tensor entry X_ijk = 1 denotes the fact that there exists a relation (i-th entity, k-th predicate, j-th entity). Otherwise, for non-existing and unknown relations, the entry is set to zero. For instance, this 2n triples can be modeled by a tensor as:
One slice: (A_i, play for, T_i)
A1, A2,...,An, T1, T2,...,Tn, NBA
A1 0 0 0 1 0 0 0
A2 0 0 0 0 1 0 0
:
An 0 0 0 0 0 1 0
T1 0 0 0 0 0 0 0
T2 0 0 0 0 0 0 0
:
Tn 0 0 0 0 0 0 0
NBA 0 0 0 0 0 0 0
The other slice: (T_i, race for, NBA)
A1, A2,...,An, T1, T2,...,Tn, NBA
A1 0 0 0 0 0 0 0
A2 0 0 0 0 0 0 0
:
An 0 0 0 0 0 0 0
T1 0 0 0 0 0 0 1
T2 0 0 0 0 0 0 1
:
Tn 0 0 0 0 0 0 1
NBA 0 0 0 0 0 0 0
Assume the RDF-triples is stored in 'test.txt'. My question is how to programming this modeling process using Python.
Here is what I think:
The most difficult thing is how to get the coordinate of the RDF-triple corresponding to the position of non-zeros in the tensor. At first, here is a list containing all entities:
T = ['A1',...,'An','T1',...'Tn','NBA']
For every RDF-triple (Subject_i, Predicate_k, Object_j) in the dataset, there is a coordinate (i,j,k) describe the position of X_ijk = 1 in a tensor. For instance, The coordinate of a existing RDF-triple (A_i, play for, T_i) is (5, 1, 13), which means X(5,13) = 1 in the first slice matrix. However, I don't know how to get this coordinate. Should I use dictionary to store the triple?
I don't quite familiar with Python, and I've tried to get the solution, but I have no idea about how to solve it. Any help would be greatly appreciated.
EDIT: For brevity and readability, I have deleted the description of RDF.
回答1:
There is a magnitude of possibilities to solve your problem, but there is a even bigger magnitutde of ambivalence in your question. Formulate it more precise, show what you want to get and why and show what you have tried so far.
It would have been better to explain why you need a n-th grade Tensor and why another representation wouldnt fit your needs, instead of explaining what rdf is.
Using a tensor just makes sense if you need tensor-operations. If so, you should look into numpy if not, you should think about an other solution. dictionaries may not be what you are looking for if you want to preserve the order in which you created the object. Maybe the OrderedDict from collections (python >= 2.7) is what you are looking for. But maybe namedtuple from collections would do it as well.
回答2:
pythons best library tool for rdf is rdflib An rdflib graph has a method of
lst = myGraph.subject_objects(MyNS.race_for)
# which is just syntactic sugar for:
lst = myGraph.triples((None,MyNS.race_for,None))
The second syntax you also find in other libraries in other languages like Java-jena etc
Within scipy you should call sparse and use that for your sparse binary array.
Look at the numpy packages for your best way to "factorize" the subjects and objects returns from the triples query. should be pretty simple. There are libraries for this in pandas but my guess is that you will have large sparse matrices and you are better off with the scipy.sparse module.
来源:https://stackoverflow.com/questions/11453352/representing-a-couple-of-rdf-triples-using-tensor-how-to-programming-this-model