问题
I am trying to wrap my head around ML and AI using TensorFlow. There is an example problem on the website which discusses the processing of .CSV data. The .CVS data is said to have been taken from the titanic and essentially contains categorical and numerical features that will be used to label a passenger as dead or alive.
First of all, if anyone know or has any resources or references that discusses that example in more detail than is done on the TensorFlow website, please could you kindly refer me to that?
Secondly, to my more important question. There is a method which is used in the example to pack the individual numeric data into a single numeric key while maintaining the categorical column as is.
The function used to do this is as bellow:
class PackNumericFeatures(object):
def __init__(self, names):
self.names = names
def __call__(self, features, labels):
numeric_freatures = [features.pop(name) for name in self.names]
numeric_features = [tf.cast(feat, tf.float32) for feat in numeric_freatures]
numeric_features = tf.stack(numeric_features, axis=-1)
features['numeric'] = numeric_features
return features, labels
The above function is called like this:
NUMERIC_FEATURES = ['age','n_siblings_spouses','parch', 'fare']
packed_train_data = raw_train_data.map(PackNumericFeatures(NUMERIC_FEATURES))
The output packed_train_data
looks something like this:
sex : [b'male' b'female' b'female' b'female' b'male']
class : [b'Third' b'First' b'Third' b'First' b'Third']
deck : [b'unknown' b'C' b'unknown' b'C' b'unknown']
embark_town : [b'Southampton' b'Cherbourg' b'Southampton' b'Southampton' b'Queenstown']
alone : [b'n' b'n' b'y' b'n' b'y']
numeric : [[22. 1. 0. 7.25 ]
[38. 1. 0. 71.2833]
[26. 0. 0. 7.925 ]
[35. 1. 0. 53.1 ]
[28. 0. 0. 8.4583]]
The above output is produced by passing a single batch of data (i.e. packed_train_data
) to the function:
def show_batch(dataset):
for batch, head in dataset.take(1):
for labels, value in batch.items():
print("{:20s}: {}".format(labels, value.numpy()))
like this:
show_batch(packed_train_data)
What I do not understand is how the map
function is working to generate that output. More generally I don't get how the call containing the map
function is interacting with the PackNumericFeatures(object)
class.
I know this problem is specific but any help would be appreciated. Cheers.
来源:https://stackoverflow.com/questions/58106929/how-does-the-tensorflow-python-data-ops-dataset-ops-datasetv1adapter-work