tf.data with multiple inputs / outputs in Keras

后端 未结 2 849
名媛妹妹
名媛妹妹 2020-12-01 11:51

For the application, such as pair text similarity, the input data is similar to: pair_1, pair_2. In these problems, we usually have multiple in

相关标签:
2条回答
  • 2020-12-01 12:23

    I'm not using Keras but I would go with an tf.data.Dataset.from_generator() - like:

    def _input_fn():
      sent1 = np.array([1, 2, 3, 4, 5, 6, 7, 8], dtype=np.int64)
      sent2 = np.array([20, 25, 35, 40, 600, 30, 20, 30], dtype=np.int64)
      sent1 = np.reshape(sent1, (8, 1, 1))
      sent2 = np.reshape(sent2, (8, 1, 1))
    
      labels = np.array([40, 30, 20, 10, 80, 70, 50, 60], dtype=np.int64)
      labels = np.reshape(labels, (8, 1))
    
      def generator():
        for s1, s2, l in zip(sent1, sent2, labels):
          yield {"input_1": s1, "input_2": s2}, l
    
      dataset = tf.data.Dataset.from_generator(generator, output_types=({"input_1": tf.int64, "input_2": tf.int64}, tf.int64))
      dataset = dataset.batch(2)
      return dataset
    
    ...
    
    model.fit(_input_fn(), epochs=10, steps_per_epoch=4)
    

    This generator can iterate over your e.g text-files / numpy arrays and yield on every call a example. In this example, I assume that the word of the sentences are already converted to the indices in the vocabulary.

    Edit: Since OP asked, it should be also possible with Dataset.from_tensor_slices():

    def _input_fn():
      sent1 = np.array([1, 2, 3, 4, 5, 6, 7, 8], dtype=np.int64)
      sent2 = np.array([20, 25, 35, 40, 600, 30, 20, 30], dtype=np.int64)
      sent1 = np.reshape(sent1, (8, 1))
      sent2 = np.reshape(sent2, (8, 1))
    
      labels = np.array([40, 30, 20, 10, 80, 70, 50, 60], dtype=np.int64)
      labels = np.reshape(labels, (8))
    
      dataset = tf.data.Dataset.from_tensor_slices(({"input_1": sent1, "input_2": sent2}, labels))
      dataset = dataset.batch(2, drop_remainder=True)
      return dataset
    
    0 讨论(0)
  • 2020-12-01 12:31

    One way to solve your issue could be to use the zip dataset to combine your various inputs:

    sent1 = np.array([1, 2, 3, 4, 5, 6, 7, 8], dtype=np.float32)
    sent2 = np.array([20, 25, 35, 40, 600, 30, 20, 30], dtype=np.float32)
    sent1 = np.reshape(sent1, (8, 1, 1))
    sent2 = np.reshape(sent2, (8, 1, 1))
    
    labels = np.array([40, 30, 20, 10, 80, 70, 50, 60], dtype=np.float32)
    labels = np.reshape(labels, (8, 1))
    
    dataset_12 = tf.data.Dataset.from_tensor_slices((sent_1, sent_2))
    dataset_label = tf.data.Dataset.from_tensor_slices(labels)
    
    dataset = tf.data.Dataset.zip((dataset_12, dataset_label)).batch(2).repeat()
    model.fit(dataset, epochs=10, steps_per_epoch=4)
    

    will print: Epoch 1/10 4/4 [==============================] - 2s 503ms/step...

    0 讨论(0)
提交回复
热议问题