Dataset API 'flat_map' method producing error for same code which works with 'map' method

后端 未结 1 1122
春和景丽
春和景丽 2021-01-01 07:27

I am trying to create a create a pipeline to read multiple CSV files using TensorFlow Dataset API and Pandas. However, using the flat_map method is producing er

1条回答
  •  伪装坚强ぢ
    2021-01-01 07:54

    As mikkola points out in the comments, the Dataset.map() and Dataset.flat_map() expect functions with different signatures: Dataset.map() takes a function that maps a single element of the input dataset to a single new element, whereas Dataset.flat_map() takes a function that maps a single element of the input dataset to a Dataset of elements.

    If you want each row of the array returned by _get_data_for_dataset() to become a separate element, you should use Dataset.flat_map() and convert the output of tf.py_func() to a Dataset, using Dataset.from_tensor_slices():

    folder_name = './data/power_data/'
    file_names = os.listdir(folder_name)
    
    def _get_data_for_dataset(file_name, rows=100):
        df_input=pd.read_csv(os.path.join(folder_name, file_name.decode()),
                             usecols=['Wind_MWh', 'Actual_Load_MWh'], nrows=rows)
        X_data = df_input.as_matrix()
        return X_data.astype('float32', copy=False)
    
    dataset = tf.data.Dataset.from_tensor_slices(file_names)
    
    # Use `Dataset.from_tensor_slices()` to make a `Dataset` from the output of 
    # the `tf.py_func()` op.
    dataset = dataset.flat_map(lambda file_name: tf.data.Dataset.from_tensor_slices(
        tf.py_func(_get_data_for_dataset, [file_name], tf.float32)))
    
    dataset = dataset.batch(2)
    
    iter = dataset.make_one_shot_iterator()
    get_batch = iter.get_next()
    

    0 讨论(0)
提交回复
热议问题