问题
I have two CSV files:
First containing ~ 500M records in the following format
id,name
10000023432,Tom User
13943423235,Blah Person
Second containing ~ 1.5B friend relationships in the following format
fromId,toId
10000023432,13943423235
I used OrientDB ETL tool to create vertices from the first CSV file. Now, I just need to create edges to establish friendship connection between them.
I have tried multiple configuration of the ETL json file so far, the latest being this one:
{
"config": {"parallel": true},
"source": { "file": { "path": "path_to_file" } },
"extractor": { "csv": {} },
"transformers": [
{ "vertex": {"class": "Person", "skipDuplicates": true} },
{ "edge": { "class": "FriendsWith",
"joinFieldName": "from",
"lookup": "Person.id",
"unresolvedLinkAction": "SKIP",
"targetVertexFields":{
"id": "${input.to}"
},
"direction": "out"
}
},
{ "code": { "language": "Javascript",
"code": "print('Current record: ' + record); record;"}
}
],
"loader": {
"orientdb": {
"dbURL": "remote:<DB connection string>",
"dbType": "graph",
"classes": [
{"name": "FriendsWith", "extends": "E"}
], "indexes": [
{"class":"Person", "fields":["id:long"], "type":"UNIQUE" }
]
}
}
}
But unfortunately, this also creates the vertex with "from" and "to" property, in addition to creating the edge.
When I try removing the vertex transformer, ETL process throws an error:
Error in Pipeline execution: com.orientechnologies.orient.etl.transformer.OTransformException: edge: input type 'com.orientechnologies.orient.core.record.impl.ODocument$1$1@40d13
6a8' is not supported
Exception in thread "OrientDB ETL pipeline-0" com.orientechnologies.orient.etl.OETLProcessHaltedException: Halt
at com.orientechnologies.orient.etl.OETLPipeline.execute(OETLPipeline.java:149)
at com.orientechnologies.orient.etl.OETLProcessor$2.run(OETLProcessor.java:341)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.orientechnologies.orient.etl.transformer.OTransformException: edge: input type 'com.orientechnologies.orient.core.record.impl.ODocument$1$1@40d136a8' is not suppor
ted
at com.orientechnologies.orient.etl.transformer.OEdgeTransformer.executeTransform(OEdgeTransformer.java:107)
at com.orientechnologies.orient.etl.transformer.OAbstractTransformer.transform(OAbstractTransformer.java:37)
at com.orientechnologies.orient.etl.OETLPipeline.execute(OETLPipeline.java:115)
... 2 more
What am I missing here?
回答1:
You can import the edges with these ETL transformers:
"transformers": [
{ "merge": { "joinFieldName": "fromId", "lookup": "Person.id" } },
{ "vertex": {"class": "Person", "skipDuplicates": true} },
{ "edge": { "class": "FriendsWith",
"joinFieldName": "toId",
"lookup": "Person.id",
"direction": "out"
}
},
{ "field": { "fieldNames": ["fromId", "toId"], "operation": "remove" } }
]
The "merge" transformer will join the current csv line with related Person record (this is a bit strange but for some reason this is neccessary to associate fromId with the source person).
The "field" transformer will remove the csv fields added by the merge section. You can try the import without "field" transformer as well to see the difference.
回答2:
With Java API you could read the csv and then create the edges
String nomeYourDb = "nomeYourDb";
OServerAdmin serverAdmin;
try {
serverAdmin = new OServerAdmin("remote:localhost/"+nomeYourDb).connect("root", "root");
if (serverAdmin.existsDatabase()) {
OrientGraph g = new OrientGraph("remote:localhost/"+nomeYourDb);
String csvFile = "path_to_file";
BufferedReader br = null;
String line = "";
String cvsSplitBy = " "; // your separator
try {
br = new BufferedReader(new FileReader(csvFile));
int index=0;
while ((line = br.readLine()) != null) {
if(index==0){
index=1;
}
else{
String[] ids = line.split(cvsSplitBy);
String personFrom="(select from Person where id='"+ids[0]+"')";
String personTo="(select from Person where id='"+ids[1]+"')";
String query="create edge FriendsWith from "+personFrom+" to "+personTo;
g.command(new OCommandSQL(query)).execute();
}
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
finally {
if (br != null) {
br.close();
}
}
}
} catch (IOException e) {
e.printStackTrace();
}
来源:https://stackoverflow.com/questions/33679571/how-to-use-orientdb-etl-to-create-edges-only