I\'m new to cloud dataflow and Java so I\'m hoping this is the right question to ask.
I have a csv file with n number of columns and rows that could be a string, intege
This example will create a collection containing 1 String
per line in the file, e.g. if the file is:
Alex,28,111-222-3344
Sam,30,555-666-7788
Drew,19,123-45-6789
then the collection will logically contain "Alex,28,111-222-3344"
, "Sam,30,555-666-7788"
, and "Drew,19,123-45-6789"
. You can apply further parsing code in Java by piping the collection through a ParDo
or MapElements
transform, e.g.:
class User {
public String name;
public int age;
public String phone;
}
PCollection lines = p.apply(TextIO.Read.from("gs://abc/def.csv"));
PCollection users = lines.apply(MapElements.via((String line) -> {
User user = new User();
String[] parts = line.split(",");
user.name = parts[0];
user.age = Integer.parseInt(parts[1]);
user.phone = parts[2];
return user;
}).withOutputType(new TypeDescriptor() {});)