Jena TDB , see how many triple stored during tdb creation

 ̄綄美尐妖づ 提交于 2019-12-11 14:25:19

问题


Hi is possible to see the number of triple in storing during tdb creation with java api? I run the TDB factory with a rar file in turtle , but during the creation of files in my directory i cant see how many triple it has stored. How can i solve this problem?


回答1:


You can access the bulk-loader through java code (to view triples introduced) as follows:

final Dataset tdbDataset = TDBFactory.createDataset( /*location*/ );
try( final InputStream in = /*get input stream for your large file*/) {
    TDBLoader.load( ((DatasetGraphTransaction)tdbDataset.asDatasetGraph()).getBaseDatasetGraph() , in, true);
}

If you have multiple files in your archive (for simplicity, I'll not do rar, but rather a zip), then as per an answer to this question, you can get optimized performance by concatenating the files into a single file prior to passing them to the bulk loader. The improved performance arises from delaying index creation until all triples have been introduced. I'm sure there are other formats that are supported, but I have only tested N-TRIPLES.

The following example utilizes IOUtils from commons-io for copying streams:

final Dataset tdbDataset = TDBFactory.createDataset( /*location*/ );
final PipedOutputStream concatOut = new PipedOutputStream();
final PipedInputStream concatIn = new PipedInputStream(concatOut);

final ExecutorService workers = Executors.newFixedThreadPool(2);
final Future<Long> submitter = workers.submit(new Callable<Long>(){
    @Override
    public Long call() throws Exception {
        long filesLoaded = 0;
        try( final ZipFile zipFile = new ZipFile( /* Archive Location */ ) {
            final Enumeration< ? extends ZipEntry> zipEntries = zipFile.entries();
            while( zipEntries.hasMoreElements() ) {
                final ZipEntry entry = zipEntries.nextElement();
                try( final InputStream singleIn = zipFile.getInputStream(entry) ) {
                    // If your file is in a supported format already
                    IOUtils.copy(singleIn, concatOut); 
                    /*(final Model m = ModelFactory.createDefaultModel();
                    m.read(singleIn, null, "lang");
                    m.write(concatOut, "N-TRIPLES");*/
                }
                filesLoaded++;
            }
        }
        concatOut.close();
        return filesLoaded;
    }});

final Future<Void> comitter = workers.submit(new Callable<Void>(){
    @Override
    public Void call() throws Exception {
        TDBLoader.load( ((DatasetGraphTransaction)tdbDataset.asDatasetGraph()).getBaseDatasetGraph() , concatIn, true);
        return null;
    }});

workers.shutdown();
System.out.println("submitted "+submitter.get()+" input files for processing");
comitter.get();
System.out.println("completed processing");
workers.awaitTermination(1, TimeUnit.SECONDS); // NOTE this wait is redundant


来源:https://stackoverflow.com/questions/24984934/jena-tdb-see-how-many-triple-stored-during-tdb-creation

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!