How to load millions of vertices from CSV into Titan 1.0.0 using Bulkloadervertextprogram?

问题

I am trying to load millions of nodes from CSV files to Titan 1.0.0 with Cassandra backend in JAVA. How to load them?

I checked we can load them using BulkLoaderVertexProgram, but it loads the data from GraphSON format.

How do I start writing a JAVA code to bulk load the data from CSV? Can you specify some starting reference where I can look into and start writing code?

Do I have to have Spark /Hadoop running on my system to use SparkComputerGraph which is used by Bulkloaderprogram?

I am not able to start writing code, as I am not understanding how to read data from CSV using bulkloderprogram. Can you provide some starting links to proceed for Java code?

Thanks.

回答1:

This was cross-posted on the Titan mailing list...

If you're looking to use Java code, check out Alex's and Matthew's Marvel graph example:

https://github.com/awslabs/dynamodb-titan-storage-backend/blob/1.0.0/src/main/java/com/amazon/titan/example/MarvelGraphFactory.java

It creates a Titan schema, parses a CSV, and then uses basic Gremlin addVertex() and addEdge() to build the graph. You'll notice that the TitanGraph isn't instantiated in the factory itself, so even though it is inside a Titan-DynamoDB example, you can use this with any Titan backend (Cassandra, HBase, Berkeley).

If your graph data is in the low millions, you could use a Titan-BerkeleyJE graph on your own machine, which might be an easier backend to use at first rather than a Cassandra cluster. I'd recommend that you do not get too caught up on loading a lot of data initially -- get comfortable with how to use Titan and TinkerPop with OLTP first and then move into OLAP approaches.

回答2:

You probably need a custom Java software to read your CSV files and load the graph with them.

If you want to use OGM, meaning you need to create a POJO classes as data model for your data, you could use Peapod to create a data model easily.

So this is an example

@Vertex
public abstract class Person {
  public abstract String getName();
  public abstract void setName(String name);

  public abstract List<Knows> getKnows();
  public abstract Knows getKnows(Person person);
  public abstract Knows addKnows(Person person);
  public abstract Knows removeKnows(Person person);
}

@Edge
public abstract class Knows {
  public abstract void setYears(int years);
  public abstract int getYears();
}

To load data, this is an example,

FramedGraph g=new FramedGraph(TitanFactory.open("path_to_prop_file"));
Person person1=g.addVertex(Person.class);
person.setName("M-T-A");

Person person2=g.addVertex(Person.class);
person2.setName("Amnesiac");

Knows pKnowsP2=person.addKnows(person1);
pKnowsP2.setYears(1);

Easier than you thought? Hope so.

回答3:

How about converting the csv into graphml and then loading it at once using gremlin

g = TitanFactory.open('bin/cassandra.local')  
gremlin> g.loadGraphML('data/graph-of-the-gods.xml')
gremlin> g.commit()

Wouldn't that be performant than making a gremlin call for each addVertex/addEdge ?

来源：https://stackoverflow.com/questions/35187601/how-to-load-millions-of-vertices-from-csv-into-titan-1-0-0-using-bulkloaderverte

标签

graph-databases

TITAN

bulkloader