问题
I am experimenting Apache Giraph.I need to create a simple graph for my csv file residing in HDFS,which shows a relationship between 2 columns.(victim related to store name) My data size is of above 1Gb csv format.Initially tried to use neo4j using java with local file.But it is only capable of loading small data and cannot import data directly from HDFS. My data may increase.So thought of using Apache Giraph.
But how to achieve the same?
Hope apache giraph only takes input in vertext format .My data is in csv format.so Is there any tool to make my csv to graph format and supply it as input to Giraph for computations in graph.
回答1:
I had the same doubts, and while a lot of responses seem to suggest to rewrite the graph into a standard format outside of Giraph, this is not necessary.
You should check out the implementation of the standard class:
https://apache.googlesource.com/giraph/+/refs/heads/trunk/giraph-core/src/main/java/org/apache/giraph/io/formats/IntNullTextEdgeInputFormat.java
This reads a TSV file (this is the "Text
" part of the class name) containing pairs of integer vertex IDs (this is the "Int
" part) of the form:
1 2
2 4
3 2
4 1
...
No edge meta-data is considered, just a pair of vertexes (this is the "Null
" part).
This example can be readily adapted to CSV by changing the SEPARATOR
, or to consider string ids by converting IntWritable
to Text
(likewise for other types).
The input format is selected later as a property you pass to the framework (giving the fully qualified name of the class you wish to use to parse the input data).
来源:https://stackoverflow.com/questions/41606341/convert-csv-data-to-graph-data