How to setup neo4j with dBpedia ontop of ruby-on-rails application?

守給你的承諾、 提交于 2019-12-20 19:57:29

问题


I am trying to use dBpedia with neo4j ontop of ruby on rails.

Assuming I have installed neo4j and downloaded one of the dBpedia datasets.

How do I import the dbpedia dataset into neo4j ?


回答1:


The simplest way to load dbpedia into Neo4j is to use the dbpedia4neo library. This is a Java library, but you don't need to know any Java because all you need to do is run the executable.

You could rewrite this in JRuby if you want, but regular Ruby won't work because it relies on Blueprints, a Java library with no Ruby equivalent.

Here are the two key files, which provide the loading procedure.

  1. https://github.com/oleiade/dbpedia4neo/blob/master/src/main/java/org/acaro/dbpedia4neo/inserter/DBpediaLoader.java
  2. https://github.com/oleiade/dbpedia4neo/blob/master/src/main/java/org/acaro/dbpedia4neo/inserter/TripleHandler.java

Here is a description of what's involved.

Blueprints is translating the RDF data to a graph representation. To understand what's going on under the hood, see Blueprints Sail Ouplementation:

After you download the dbpedia dump files, you should be able to build the dbpedia4neo Java library and run it without modifying the Java code.

First, clone the oleiade's fork of the GitHub repository and change to the dbpedia4neo directory:

$ git clone https://github.com/oleiade/dbpedia4neo.git
$ cd dbpedia4neo

(Oleiade's fork includes a minor Blueprints update that does sail.initialize(); See https://groups.google.com/d/msg/gremlin-users/lfpNcOwZ49Y/WI91ae-UzKQJ).

Before you build it, you will need to update the pom.xml to use more current Blueprints versions and the current Blueprints repository (Sonatype).

To do this, open pom.xml and at the top of the dependencies section, change all of the TinkerPop Blueprints versions from 0.6 to 0.9.

While you are in the file, add the Sonatype repository to the repositories section at the end of the file:

<repository>
  <id>sonatype-nexus-snapshots</id>
  <name>Sonatype Nexus Snapshots</name>
  <url>https://oss.sonatype.org/content/repositories/releases</url>
</repository>

Save the file and then build it using maven:

$ mvn clean install

This will download and install all the dependencies for you and create a jar file in the target directory.

To load dbpedia, use maven to run the executable:

$ mvn exec:java \
  -Dexec.mainClass=org.acaro.dbpedia4neo.inserter.DBpediaLoader \
  -Dexec.args="/path/to/dbpedia-dump.nt"

The dbpedia dump is large so this will take a while to load.

Now that the data is loaded, you can access the graph in one of two ways:

  1. Use JRuby and the Blueprints-Neo4j API directly.
  2. Use regular Ruby and the Rexster REST server, which is similar to Neo4j Server except that it supports multiple graph databases.

For an example of how to create a Rexster client, see Bulbs, a Python framework I wrote that supports both Neo4j Server and Rexster.

  • http://bulbflow.com/
  • https://github.com/espeed/bulbs
  • https://github.com/espeed/bulbs/tree/master/bulbs/rexster

Another approach to all this would be to process the dbpedia RDF dump file in Ruby, write out the nodes and relationships to a CSV file, and use the Neo4j batch importer to load it. But this will require that you manually translate the RDF data into Neo4j relationships.




回答2:


The way I see it, you have two options.

  1. You could either attempt to implement an approach like this one exactly, or fork the repo behind this approach (or another like it) and extend/fix it to fit your purposes.

  2. Do it yourself, from scratch. Here's the general approach:

Parse your dbpedia dataset into a format suitable for neo4j's insertion methods. There are libraries that exist like openRDF that exist to process data. Unless you plan to take the time to research which would suit your needs best, the existing solution I linked above already implements this library.

Then insert the formatted data into your neo4j db. One method to accomplish this is through neo4j's Batch Insertion component. Note this facility, as they state, is intended for initial imports (as it's not thread safe and is non-transactional, in other words, not ACID-compliant). So this really depends on your use case.

My 2 cents is that you use something already out there unless this functionality is the core of what you're developing. As it's something that will be a pain to build, and even more a pain to build something that runs efficiently.



来源:https://stackoverflow.com/questions/12212015/how-to-setup-neo4j-with-dbpedia-ontop-of-ruby-on-rails-application

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!