Cloud ML Engine batch predictions - How to simply match returned predictions with input data?

问题

According to the ML Engine documentation, an instance key is required to match the returned predictions with the input data. For simplicity purposes, I would like to use a DNNClassifier but apparently canned estimators don't seem to support instance keys yet (only custom or tensorflow core estimators).

So I looked at the Census code examples of Custom/TensorflowCore Estimators but they look quite complex for what I am trying to achieve.

I would prefer using a similar approach as described in this stackoverflow answer (wrapping a DNNClassifier into a custom estimator) but I can not make it work and I got an error saying that 'DNNClassifier' object has no attribute 'model_fn'...

How can I achieve this in a simple manner?

回答1:

In version 1.2 the contrib estimators (tf.contrib.learn.DNNClassifier for example), were changed to inherit from the core estimator class tf.estimator.Estimator which unlike it's predecessor, hides the model function as a private class member.

Try estimator._model_fn rather than estimator.model_fn. You should be able to leave everything else in my previous answer the same.

EDIT: I've updated my original answer here: https://stackoverflow.com/a/44443380/3597868 to reflect the necessary changes with version 1.2

回答2:

My code as per Eli's example:

def key_model_fn_gen(estimator):
   def _model_fn(feature_columns, labels, mode):
      key = feature_columns.pop(KEY)
      params = estimator.params

      model_fn_ops = estimator._model_fn(features=feature_columns,
                 labels=labels,
                 mode=mode,
                 params=params)
      model_fn_ops.predictions[KEY] = key        

     return model_fn_ops
return _model_fn

but still unsuccessful to display the instance key in the result of predictions using ML Engine batch predictions... What do I need to change in the Experiment (or maybe in the export strategy) to make it work?

回答3:

System/Version Info

Canned census example committed on 2017_06_22_15:06:37.
TensorFlow 1.2.
Python 3
GCP ML Engine 1.2

Approach

Fabrice, I had the same question as you and it took me a while to figure this one out (with the generous help of Eli). I took a slightly different approach. Instead of trying to create an instance key, I made the assumption that the instance key would be in the data (training, evaluation, and prediction).

Here, I use the gender field as the instance key. Obviously, I would not use the gender field in reality as an instance key, I'm only using it here for illustration purposes.

Other than those changes described here, am not making any updates to any other functions or constants from the original script other than to change some things from python 2 to python 3, e.g., changing dict.iteritems() to dict.items().

Here is a gist of my modified model.py file. I did not make any changes to the task.py file.

Updating the `key_model_fn_gen()` function

This code relies on guidance I got from Eli. The insight for me was that I need to modify the output_alternatives dictionary in order to return the key and that I do not need to modify the predictions dictionary. (Additionally, I learned that I could get the params as an attribute of the estimator from your (Fabrice's) example, thanks for that.)

KEY = 'gender'
def key_model_fn_gen(estimator):
    def _model_fn(features, labels, mode):
        key = features.pop(KEY)
        params = estimator.params
        model_fn_ops = estimator._model_fn(features=features, labels=labels, mode=mode, params=params)
        model_fn_ops.output_alternatives[None][1]['key'] = key
        return model_fn_ops
    return _model_fn

Updating the `build_estimator()` function

I remove gender from deep_columns list and wide_columns list so that it is not used as a feature for training and evaluation.
I modify the return to include the key wrapper per Eli's guidance.
I get the model_dir as an attribute of config.

Here is the full code:

def build_estimator(config, embedding_size=8, hidden_units=None):

  (gender, race, education, marital_status, relationship,
   workclass, occupation, native_country, age,
   education_num, capital_gain, capital_loss, hours_per_week) = INPUT_COLUMNS
  """Build an estimator."""

  # Reused Transformations.
  # Continuous columns can be converted to categorical via bucketization
  age_buckets = tf.feature_column.bucketized_column(
      age, boundaries=[18, 25, 30, 35, 40, 45, 50, 55, 60, 65])

  # Wide columns and deep columns.
  wide_columns = [
      # Interactions between different categorical features can also
      # be added as new virtual features.
      tf.feature_column.crossed_column(
          ['education', 'occupation'], hash_bucket_size=int(1e4)),
      tf.feature_column.crossed_column(
          [age_buckets, race, 'occupation'], hash_bucket_size=int(1e6)),
      tf.feature_column.crossed_column(
          ['native_country', 'occupation'], hash_bucket_size=int(1e4)),
      native_country,
      education,
      occupation,
      workclass,
      marital_status,
      relationship,
      age_buckets,
  ]

  deep_columns = [
      # Use indicator columns for low dimensional vocabularies
      tf.feature_column.indicator_column(workclass),
      tf.feature_column.indicator_column(education),
      tf.feature_column.indicator_column(marital_status),
      tf.feature_column.indicator_column(relationship),
      tf.feature_column.indicator_column(race),

      # Use embedding columns for high dimensional vocabularies
      tf.feature_column.embedding_column(
          native_country, dimension=embedding_size),
      tf.feature_column.embedding_column(occupation, dimension=embedding_size),
      age,
      education_num,
      capital_gain,
      capital_loss,
      hours_per_week,
  ]

  return tf.contrib.learn.Estimator(
      model_fn=key_model_fn_gen(
          tf.contrib.learn.DNNLinearCombinedClassifier(
              config=config,
              linear_feature_columns=wide_columns,
              dnn_feature_columns=deep_columns,
              dnn_hidden_units=hidden_units or [100, 70, 50, 25],
              fix_global_step_increment_bug=True)
      ),
      model_dir=config.model_dir
  )

Input format for batch predictions

After the version has been uploaded to ML Engine, the prediction input takes the following form:

{"native_country":" United-States","race":" Black","age":"44","relationship":" Other-relative","gender":" Male","marital_status":" Never-married","hours_per_week":"32","capital_gain":"0","education_num":"9","education":" HS-grad","occupation":" Other-service","capital_loss":"0","workclass":" Private"}
{"native_country":" United-States","race":" White","age":"35","relationship":" Not-in-family","gender":" Male","marital_status":" Divorced","hours_per_week":"40","capital_gain":"0","education_num":"9","education":" HS-grad","occupation":" Craft-repair","capital_loss":"0","workclass":" Private"}
{"native_country":" United-States","race":" White","age":"20","relationship":" Husband","gender":" Male","marital_status":" Married-civ-spouse","hours_per_week":"40","capital_gain":"0","education_num":"10","education":" Some-college","occupation":" Craft-repair","capital_loss":"0","workclass":" Private"}
{"native_country":" United-States","race":" White","age":"43","relationship":" Husband","gender":" Male","marital_status":" Married-civ-spouse","hours_per_week":"50","capital_gain":"0","education_num":"10","education":" Some-college","occupation":" Farming-fishing","capital_loss":"0","workclass":" Self-emp-not-inc"}
{"native_country":" England","race":" White","age":"33","relationship":" Husband","gender":" Male","marital_status":" Married-civ-spouse","hours_per_week":"40","capital_gain":"0","education_num":"13","education":" Bachelors","occupation":" Farming-fishing","capital_loss":"0","workclass":" Private"}
{"native_country":" United-States","race":" White","age":"38","relationship":" Unmarried","gender":" Female","marital_status":" Divorced","hours_per_week":"56","capital_gain":"0","education_num":"13","education":" Bachelors","occupation":" Prof-specialty","capital_loss":"0","workclass":" Private"}
{"native_country":" United-States","race":" White","age":"53","relationship":" Not-in-family","gender":" Female","marital_status":" Never-married","hours_per_week":"35","capital_gain":"8614","education_num":"14","education":" Masters","occupation":" ?","capital_loss":"0","workclass":" ?"}
{"native_country":" China","race":" Asian-Pac-Islander","age":"64","relationship":" Husband","gender":" Male","marital_status":" Married-civ-spouse","hours_per_week":"60","capital_gain":"0","education_num":"14","education":" Masters","occupation":" Prof-specialty","capital_loss":"2057","workclass":" Private"}

Output format of batch prediction

After completing the batch prediction job, I get the following output:

{"probabilities": [0.9633187055587769, 0.036681365221738815], "classes": ["0", "1"], "key": [" Male"]}
{"probabilities": [0.9452069997787476, 0.05479296296834946], "classes": ["0", "1"], "key": [" Male"]}
{"probabilities": [0.8586776852607727, 0.1413223296403885], "classes": ["0", "1"], "key": [" Male"]}
{"probabilities": [0.7370017170906067, 0.2629982531070709], "classes": ["0", "1"], "key": [" Male"]}
{"probabilities": [0.48797568678855896, 0.5120242238044739], "classes": ["0", "1"], "key": [" Male"]}
{"probabilities": [0.8111950755119324, 0.18880495429039001], "classes": ["0", "1"], "key": [" Female"]}
{"probabilities": [0.5560402274131775, 0.4439597725868225], "classes": ["0", "1"], "key": [" Female"]}
{"probabilities": [0.3235422968864441, 0.6764576435089111], "classes": ["0", "1"], "key": [" Male"]}

来源：https://stackoverflow.com/questions/45433969/cloud-ml-engine-batch-predictions-how-to-simply-match-returned-predictions-wit

标签

tensorflow

tensorflow-serving

google-cloud-ml

google-cloud-ml-engine