问题
I am trying to read some text file containing relevant vertices information into Giraph: each line is
vertex_id attribute_1 attribute_2 .....attribute_n
where each attribute is a string.
The goal would be to create a vertex where all these attributes are part of vertex's value.
Looking up the various input formats I could not find anything out of the box, so I assume I have to derive my vertex input class from VertexValueInputFormat (I have a separate reader for edges).
Problem is: how? I have created a a Value class which contains a String[] array, but how do I hand it over to Giraph/Hadoop? Here is a reader for a single line:
https://giraph.apache.org/giraph-core/apidocs/org/apache/giraph/io/formats/TextVertexValueInputFormat.TextVertexValueReaderFromEachLine.html
protected abstract V getValue(org.apache.hadoop.io.Text line)
The thought was, V will be an ArrayWritable, but does not seem to like it.
Any clue? Thanks
回答1:
If your vertex has a custom value (in your case array of string), then you need to have a custom vertex value class and a custom vertex input format.
As an example, take a look at a very simple custom vertex class. This class has a double
value, an int
, and a long
: https://gist.github.com/sar-vivek/df09cca17cc3f6b5ac60
note - you must override readFields()
and write()
accordingly.
Then you need to have a custom vertex input format. For above vertex class, I have modified the in-built json vertex reader a little bit. Here is the example - https://gist.github.com/sar-vivek/f39edacec6d9a43c3717 [notice how the value of a vertex is set on line 68].
来源:https://stackoverflow.com/questions/24800957/vertices-with-complex-values-in-apache-giraph