I am trying to use dBpedia
with neo4j
ontop of ruby on rails
.
Assuming I have installed neo4j and downloaded one of the dBpedia da
The way I see it, you have two options.
You could either attempt to implement an approach like this one exactly, or fork the repo behind this approach (or another like it) and extend/fix it to fit your purposes.
Do it yourself, from scratch. Here's the general approach:
Parse your dbpedia dataset into a format suitable for neo4j's insertion methods. There are libraries that exist like openRDF that exist to process data. Unless you plan to take the time to research which would suit your needs best, the existing solution I linked above already implements this library.
Then insert the formatted data into your neo4j db. One method to accomplish this is through neo4j's Batch Insertion component. Note this facility, as they state, is intended for initial imports (as it's not thread safe and is non-transactional, in other words, not ACID-compliant). So this really depends on your use case.
My 2 cents is that you use something already out there unless this functionality is the core of what you're developing. As it's something that will be a pain to build, and even more a pain to build something that runs efficiently.
The simplest way to load dbpedia into Neo4j is to use the dbpedia4neo library. This is a Java library, but you don't need to know any Java because all you need to do is run the executable.
You could rewrite this in JRuby if you want, but regular Ruby won't work because it relies on Blueprints, a Java library with no Ruby equivalent.
Here are the two key files, which provide the loading procedure.
Here is a description of what's involved.
Blueprints is translating the RDF data to a graph representation. To understand what's going on under the hood, see Blueprints Sail Ouplementation:
After you download the dbpedia dump files, you should be able to build the dbpedia4neo Java library and run it without modifying the Java code.
First, clone the oleiade's fork of the GitHub repository and change to the dbpedia4neo
directory:
$ git clone https://github.com/oleiade/dbpedia4neo.git
$ cd dbpedia4neo
(Oleiade's fork includes a minor Blueprints update that does sail.initialize();
See https://groups.google.com/d/msg/gremlin-users/lfpNcOwZ49Y/WI91ae-UzKQJ).
Before you build it, you will need to update the pom.xml
to use more current Blueprints versions and the current Blueprints repository (Sonatype).
To do this, open pom.xml
and at the top of the dependencies
section, change all of the TinkerPop Blueprints versions from 0.6
to 0.9
.
While you are in the file, add the Sonatype repository to the repositories
section at the end of the file:
<repository>
<id>sonatype-nexus-snapshots</id>
<name>Sonatype Nexus Snapshots</name>
<url>https://oss.sonatype.org/content/repositories/releases</url>
</repository>
Save the file and then build it using maven:
$ mvn clean install
This will download and install all the dependencies for you and create a jar file in the target
directory.
To load dbpedia, use maven to run the executable:
$ mvn exec:java \
-Dexec.mainClass=org.acaro.dbpedia4neo.inserter.DBpediaLoader \
-Dexec.args="/path/to/dbpedia-dump.nt"
The dbpedia dump is large so this will take a while to load.
Now that the data is loaded, you can access the graph in one of two ways:
For an example of how to create a Rexster client, see Bulbs, a Python framework I wrote that supports both Neo4j Server and Rexster.
Another approach to all this would be to process the dbpedia RDF dump file in Ruby, write out the nodes and relationships to a CSV file, and use the Neo4j batch importer to load it. But this will require that you manually translate the RDF data into Neo4j relationships.