Writing tfrecords in apche_beam with java

牧云@^-^@ 提交于 2020-04-18 01:06:00

问题


How can I write the following code in java? If I have list of records/dicts in java how can I write the beam code to write them in tfrecords where tf.train.Examples are serialized. There are lot of examples to do that with python, below is one example in python, how can I write the same logic in java ?

import tensorflow as tf
import apache_beam as beam
from apache_beam.runners.interactive import interactive_runner
from apache_beam.coders import ProtoCoder

class Foo(beam.DoFn):
  def process(self, element, *args, **kwargs):
    import tensorflow as tf

    foo = element.get('foo')
    bar = element.get('bar')

    feature = {
      "foo":
        tf.train.Feature(bytes_list=tf.train.BytesList(value=[foo.encode('utf-8')])),
      "bar":
        tf.train.Feature(bytes_list=tf.train.BytesList(value=[bar.encode('utf-8')]))
    }
    example_proto = tf.train.Example(features=tf.train.Features(feature=feature))
    yield example_proto

p = beam.Pipeline(runner=interactive_runner.InteractiveRunner())

records = p | "Create records" >> beam.Create([{'foo': 'abc', 'bar': 'pqr'} for _ in range(10)])
tf_examples = records | "Convert to tf examples" >> beam.ParDo(Foo())
tf_examples | "Dump Records" >> beam.io.WriteToTFRecord(file_path_prefix="./output/data-",
                                                    coder=ProtoCoder(tf.train.Example()),
                                                    file_name_suffix='.tfrecord', num_shards=2)

p.run()

回答1:


I have attempted this with java but I am still getting some issues, The link to new to question is here Write tfrecords from beam pipeline?.



来源:https://stackoverflow.com/questions/61247661/writing-tfrecords-in-apche-beam-with-java

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!