Tensorflow embedding for categorical feature

问题

In machine learning, it is common to represent a categorical (specifically: nominal) feature with one-hot-encoding. I am trying to learn how to use tensorflow's embedding layer to represent a categorical feature in a classification problem. I have got tensorflow version 1.01 installed and I am using Python 3.6.

I am aware of the tensorflow tutorial for word2vec, but it is not very instructive for my case. While building the tf.Graph, it uses NCE-specific weights and tf.nn.nce_loss.

I just want a simple feed-forward net as below, and the input layer to be an embedding. My attempt is below. It complains when I try to matrix multiply the embedding with the hidden layer due to shape incompatibility. Any ideas how I can fix this?

from __future__ import print_function
import pandas as pd; 
import tensorflow as tf
import numpy as np
from sklearn.preprocessing import LabelEncoder

if __name__  == '__main__':

    # 1 categorical input feature and a binary output
    df = pd.DataFrame({'cat2': np.array(['o', 'm', 'm', 'c', 'c', 'c', 'o', 'm', 'm', 'm']),
                       'label': np.array([0, 0, 1, 1, 0, 0, 1, 0, 1, 1])})

    encoder = LabelEncoder()
    encoder.fit(df.cat2.values)
    X = encoder.transform(df.cat2.values)

    Y = np.zeros((len(df), 2))
    Y[np.arange(len(df)), df.label.values] = 1

    # Neural net parameters
    training_epochs = 5
    learning_rate = 1e-3
    cardinality = len(np.unique(X))
    embedding_size = 2
    input_X_size = 1
    n_labels = len(np.unique(Y))
    n_hidden = 10

    # Placeholders for input, output
    x = tf.placeholder(tf.int32, [None, 1], name="input_x")
    y = tf.placeholder(tf.float32, [None, 2], name="input_y")

    # Neural network weights
    embeddings = tf.Variable(tf.random_uniform([cardinality, embedding_size], -1.0, 1.0))
    h = tf.get_variable(name='h2', shape=[embedding_size, n_hidden],
                        initializer=tf.contrib.layers.xavier_initializer())
    W_out = tf.get_variable(name='out_w', shape=[n_hidden, n_labels],
                            initializer=tf.contrib.layers.xavier_initializer())

    # Neural network operations
    embedded_chars = tf.nn.embedding_lookup(embeddings, x)

    layer_1 = tf.matmul(embedded_chars,h)
    layer_1 = tf.nn.relu(layer_1)
    out_layer = tf.matmul(layer_1, W_out)

    # Define loss and optimizer
    cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=out_layer, labels=y))
    optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

    # Initializing the variables
    init = tf.global_variables_initializer()

    # Launch the graph
    with tf.Session() as sess:
        sess.run(init)

        for epoch in range(training_epochs):
            avg_cost = 0.

            # Run optimization op (backprop) and cost op (to get loss value)
            _, c = sess.run([optimizer, cost],
                             feed_dict={x: X, y: Y})
    print("Optimization Finished!")

EDIT:

Please see below the error message:

Traceback (most recent call last):
  File "/home/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/common_shapes.py", line 671, in _call_cpp_shape_fn_impl
    input_tensors_as_shapes, status)
  File "/home/anaconda3/lib/python3.6/contextlib.py", line 89, in __exit__
    next(self.gen)
  File "/home/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Shape must be rank 2 but is rank 3 for 'MatMul' (op: 'MatMul') with input shapes: [?,1,2], [2,10].

回答1:

Just make your x placeholder be size [None] instead of [None, 1]

来源：https://stackoverflow.com/questions/43574889/tensorflow-embedding-for-categorical-feature

标签

tensorflow

classification

embedding

categorical-data