Stardog data loading and Jena

有些话、适合烂在心里 提交于 2019-12-25 06:27:10

问题


I am using Stardog to store a bunch of triples that come from different sources. I use Jena to collect and merge the data in a single Jena graph. All these triples are part of ABoxes.

  1. I am not sure Stardog will require that the TBox is also merged with the ABox graphs. I supposed it does because otherwise I cannot see how Stardog will do reasoning over the data. I have not seen any option to store and use the TBox apart as in some others triple stores. Do I need to include the TBox in the Jena graph or is there a way to store the TBox in another Stardog database so when querying the database of ABoxes it is taken into consideration too?

  2. I am considering options to load the Jena graph (varies between 1 and 7 million triples) into Stardog:

    • One of the options I don't really like is to write the graph into a file and execute the client to load it into Stardog. One the data is in a Jena graph, I would prefer a direct solution.
    • Another option is to load the triples one by one (example of stardog sparql insert query in java), which I dislike for potential inefficiency.

Is there any elegant way to load the whole graph from Jena?

EDIT

Attempt of code based on the example in the distribution:

Server aServer = Stardog.buildServer()
        .bind(new InetSocketAddress("10.0.0.1", 5820))
        .start();

AdminConnection aAdminConnection = AdminConnectionConfiguration.toServer("...").credentials("admin", "admin").connect();
        if (aAdminConnection.list().contains("test")) {
            aAdminConnection.drop("test");
        }

Connection aConn = aAdminConnection.memory("test").create(file).connect();

Model aModel = SDJenaFactory.createModel(aConn);

EDIT 2: Corrected some bits of my code.

Additional information in the Stardog documentation


回答1:


1) It does not matter where you store your TBox as long as it's in Stardog. By default, Stardog will look in the default graph for your TBox and extract it automatically. But this can be configured using the reasoning.schema.graphs configuration option as noted in the documentation. Generally, you may find the chapter on how reasoning is implemented in Stardog a useful read.

2) Don't load triples one by one, it's not very efficient. The fastest way to get data into Stardog is to load it when the database is created; the bulk loader can be used in this instance which achieves optimal write speed. Once the database is created, you can use the SNARL API, CLI, or Jena API to load a file, which is the next fastest way to get data into the database. If you are using the Jena API, you have to use their BulkUpdateHandler directly, or load RDF/XML, whose reader seems to use the bulk updater behind the scenes.

EDIT:

Your code is incorrect. You're binding a server on an actual socket & port, and then attempting to connect to the embedded server, which you are not running. You have to either modify your server start to use the embedded server as shown in the examples, or modify your initialization of your AdminConnectionConfiguration to specify the server URL using toServer.

Further, rather than using the convenience method createMemory you can call AdminConnection#memory which will return a DatabaseBuilder whose create method takes a list of files to bulk load into the new database.

You should also consider using a disk-based database for the storage of millions of triples.



来源:https://stackoverflow.com/questions/24332922/stardog-data-loading-and-jena

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!