How does the tensorflow.python.data.ops.dataset_ops.DatasetV1Adapter work?

心不动则不痛 提交于 2021-01-27 06:40:31

问题


I am trying to wrap my head around ML and AI using TensorFlow. There is an example problem on the website which discusses the processing of .CSV data. The .CVS data is said to have been taken from the titanic and essentially contains categorical and numerical features that will be used to label a passenger as dead or alive.

First of all, if anyone know or has any resources or references that discusses that example in more detail than is done on the TensorFlow website, please could you kindly refer me to that?

Secondly, to my more important question. There is a method which is used in the example to pack the individual numeric data into a single numeric key while maintaining the categorical column as is.

The function used to do this is as bellow:

class PackNumericFeatures(object):
    def __init__(self, names):
        self.names = names

    def __call__(self, features, labels):
        numeric_freatures = [features.pop(name) for name in self.names]
        numeric_features = [tf.cast(feat, tf.float32) for feat in numeric_freatures]
        numeric_features = tf.stack(numeric_features, axis=-1)
        features['numeric'] = numeric_features

        return features, labels

The above function is called like this:


NUMERIC_FEATURES = ['age','n_siblings_spouses','parch', 'fare']

packed_train_data = raw_train_data.map(PackNumericFeatures(NUMERIC_FEATURES))

The output packed_train_data looks something like this:

sex                 : [b'male' b'female' b'female' b'female' b'male']
class               : [b'Third' b'First' b'Third' b'First' b'Third']
deck                : [b'unknown' b'C' b'unknown' b'C' b'unknown']
embark_town         : [b'Southampton' b'Cherbourg' b'Southampton' b'Southampton' b'Queenstown']
alone               : [b'n' b'n' b'y' b'n' b'y']
numeric             : [[22.      1.      0.      7.25  ]
 [38.      1.      0.     71.2833]
 [26.      0.      0.      7.925 ]
 [35.      1.      0.     53.1   ]
 [28.      0.      0.      8.4583]]

The above output is produced by passing a single batch of data (i.e. packed_train_data) to the function:


def show_batch(dataset):
    for batch, head in dataset.take(1):
        for labels, value in batch.items():
            print("{:20s}: {}".format(labels, value.numpy()))

like this:

show_batch(packed_train_data)

What I do not understand is how the map function is working to generate that output. More generally I don't get how the call containing the map function is interacting with the PackNumericFeatures(object) class.

I know this problem is specific but any help would be appreciated. Cheers.

来源:https://stackoverflow.com/questions/58106929/how-does-the-tensorflow-python-data-ops-dataset-ops-datasetv1adapter-work

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!