How to use OrientDB ETL to create edges only

天大地大妈咪最大 提交于 2020-01-02 04:01:06

问题


I have two CSV files:

First containing ~ 500M records in the following format

id,name
10000023432,Tom User
13943423235,Blah Person

Second containing ~ 1.5B friend relationships in the following format

fromId,toId
10000023432,13943423235

I used OrientDB ETL tool to create vertices from the first CSV file. Now, I just need to create edges to establish friendship connection between them.

I have tried multiple configuration of the ETL json file so far, the latest being this one:

{
    "config": {"parallel": true},
    "source": { "file": { "path": "path_to_file" } },
    "extractor": { "csv": {} },
    "transformers": [
        { "vertex": {"class": "Person", "skipDuplicates": true} },
        { "edge": { "class": "FriendsWith",
                    "joinFieldName": "from",
                    "lookup": "Person.id",
                    "unresolvedLinkAction": "SKIP",
                    "targetVertexFields":{
                        "id": "${input.to}"
                    },
                    "direction": "out"
                  }
        },
        { "code": { "language": "Javascript",
                    "code": "print('Current record: ' + record);  record;"}
        }
    ],
    "loader": {
        "orientdb": {
            "dbURL": "remote:<DB connection string>",
            "dbType": "graph",
            "classes": [
                {"name": "FriendsWith", "extends": "E"}
            ], "indexes": [
                {"class":"Person", "fields":["id:long"], "type":"UNIQUE" }
            ]
        }
    }
}

But unfortunately, this also creates the vertex with "from" and "to" property, in addition to creating the edge.

When I try removing the vertex transformer, ETL process throws an error:

Error in Pipeline execution: com.orientechnologies.orient.etl.transformer.OTransformException: edge: input type 'com.orientechnologies.orient.core.record.impl.ODocument$1$1@40d13
6a8' is not supported
Exception in thread "OrientDB ETL pipeline-0" com.orientechnologies.orient.etl.OETLProcessHaltedException: Halt
        at com.orientechnologies.orient.etl.OETLPipeline.execute(OETLPipeline.java:149)
        at com.orientechnologies.orient.etl.OETLProcessor$2.run(OETLProcessor.java:341)
        at java.lang.Thread.run(Thread.java:745)
Caused by: com.orientechnologies.orient.etl.transformer.OTransformException: edge: input type 'com.orientechnologies.orient.core.record.impl.ODocument$1$1@40d136a8' is not suppor
ted
        at com.orientechnologies.orient.etl.transformer.OEdgeTransformer.executeTransform(OEdgeTransformer.java:107)
        at com.orientechnologies.orient.etl.transformer.OAbstractTransformer.transform(OAbstractTransformer.java:37)
        at com.orientechnologies.orient.etl.OETLPipeline.execute(OETLPipeline.java:115)
        ... 2 more

What am I missing here?


回答1:


You can import the edges with these ETL transformers:

"transformers": [
    { "merge": { "joinFieldName": "fromId", "lookup": "Person.id" } },
    { "vertex": {"class": "Person", "skipDuplicates": true} },
    { "edge": { "class": "FriendsWith",
                "joinFieldName": "toId",
                "lookup": "Person.id",
                "direction": "out"
              }
    },
    { "field": { "fieldNames": ["fromId", "toId"], "operation": "remove" } }
]

The "merge" transformer will join the current csv line with related Person record (this is a bit strange but for some reason this is neccessary to associate fromId with the source person).

The "field" transformer will remove the csv fields added by the merge section. You can try the import without "field" transformer as well to see the difference.




回答2:


With Java API you could read the csv and then create the edges

        String nomeYourDb = "nomeYourDb";
        OServerAdmin serverAdmin;
        try {
            serverAdmin = new OServerAdmin("remote:localhost/"+nomeYourDb).connect("root", "root");
            if (serverAdmin.existsDatabase()) {
                OrientGraph g = new OrientGraph("remote:localhost/"+nomeYourDb);
                String csvFile = "path_to_file";
                BufferedReader br = null;
                String line = "";
                String cvsSplitBy = "   ";   // your separator
                try {
                    br = new BufferedReader(new FileReader(csvFile));
                    int index=0;
                    while ((line = br.readLine()) != null) {
                        if(index==0){
                            index=1;
                        }
                        else{
                            String[] ids = line.split(cvsSplitBy);
                            String personFrom="(select from Person where id='"+ids[0]+"')";
                            String personTo="(select from Person where id='"+ids[1]+"')";
                            String query="create edge FriendsWith from "+personFrom+" to "+personTo;
                            g.command(new OCommandSQL(query)).execute();
                        }
                    }
                } catch (FileNotFoundException e) {
                    e.printStackTrace();
                } catch (IOException e) {
                    e.printStackTrace();
                }
                finally {
                if (br != null) {
                        br.close();
                }
            }
            }
        } catch (IOException e) {
            e.printStackTrace();
        }


来源:https://stackoverflow.com/questions/33679571/how-to-use-orientdb-etl-to-create-edges-only

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!