How to skip carriage returns in csv file while reading from cloud storage using google cloud dataflow in java

梦想与她 提交于 2019-12-02 13:03:43

TextIO can not do this - it always splits input based on carriage returns and is not aware of CSV-specific quoting of some of these carriage returns.

However, Beam 2.2 includes a transform that will make it very easy for you to write the CSV-specific (or any other file format specific reading) code yourself: FileIO. Do something like this:

p.apply(FileIO.match().filepattern("gs://..."))
 .apply(FileIO.readMatches())
 .apply(ParDo.of(new DoFn<ReadableFile, TableRow>() {
   @ProcessElement
   public void process(ProcessContext c) throws IOException {
     try (InputStream is = Channels.newInputStream(c.element().open())) {
       // ... Use your favorite Java CSV library ...
       ... c.output(next csv record) ...
     }
   }
 }))
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!