Data Normalization with tensorflow tf-transform

不想你离开。 提交于 2019-12-02 19:48:39

First of all, you don't really need tf.transform for this. All you need to do is to write a function that you call from both the training/eval input_fn and from your serving input_fn.

For example, assuming that you've used Pandas on your whole dataset to figure out the min and max

```
def add_engineered(features):
  min_x = 22
  max_x = 43
  features['x'] = (features['x'] - min_x) / (max_x - min_x)
  return features
```

Then, in your input_fn, wrap the features you return with a call to add_engineered:

```
def input_fn():
  features = ...
  label = ...
  return add_engineered(features), label
```

and in your serving_input fn, make sure to similarly wrap the returned features (NOT the feature_placeholders) with a call to add_engineered:

```
def serving_input_fn():
    feature_placeholders = ...
    features = feature_placeholders.copy()
    return tf.estimator.export.ServingInputReceiver(
         add_engineered(features), feature_placeholders)
```

Now, your JSON input at prediction time would only need to contain the original, unscaled values.

Here's a complete working example of this approach.

https://github.com/GoogleCloudPlatform/training-data-analyst/blob/master/courses/machine_learning/feateng/taxifare/trainer/model.py#L130

tf.transform provides for a two-phase process: an analysis step to compute the min, max and a graph-modification step to insert the scaling for you into your TensorFlow graph. So, to use tf.transform, you first need to write a Dataflow pipeline does the analysis and then plug in calls to tf.scale_0_to_1 inside your TensorFlow code. Here's an example of doing this:

https://github.com/GoogleCloudPlatform/cloudml-samples/tree/master/criteo_tft

The add_engineered() approach is simpler and is what I would suggest. The tf.transform approach is needed if your data distributions will shift over time, and so you want to automate the entire pipeline (e.g. for continuous training).

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!