Tensorflow One Hot Encoder?

后端 未结 15 1625
一生所求
一生所求 2020-12-04 13:46

Does tensorflow have something similar to scikit learn\'s one hot encoder for processing categorical data? Would using a placeholder of tf.string behave as categorical data

相关标签:
15条回答
  • 2020-12-04 14:24

    There are a couple ways to do it.

    ans = tf.constant([[5, 6, 0, 0], [5, 6, 7, 0]]) #batch_size*max_seq_len
    labels = tf.reduce_sum(tf.nn.embedding_lookup(np.identity(10), ans), 1)
    
    >>> [[ 0.  0.  0.  0.  0.  1.  1.  0.  0.  0.]
    >>> [ 0.  0.  0.  0.  0.  1.  1.  1.  0.  0.]]
    

    The other way to do it is.

    labels2 = tf.reduce_sum(tf.one_hot(ans, depth=10, on_value=1, off_value=0, axis=1), 2)
    
     >>> [[0 0 0 0 0 1 1 0 0 0]
     >>> [0 0 0 0 0 1 1 1 0 0]]
    
    0 讨论(0)
  • 2020-12-04 14:27

    tf.one_hot() is available in TF and easy to use.

    Lets assume you have 4 possible categories (cat, dog, bird, human) and 2 instances (cat, human). So your depth=4 and your indices=[0, 3]

    import tensorflow as tf
    res = tf.one_hot(indices=[0, 3], depth=4)
    with tf.Session() as sess:
        print sess.run(res)
    

    Keep in mind that if you provide index=-1 you will get all zeros in your one-hot vector.

    Old answer, when this function was not available.

    After looking though the python documentation, I have not found anything similar. One thing that strengthen my belief that it does not exist is that in their own example they write one_hot manually.

    def dense_to_one_hot(labels_dense, num_classes=10):
      """Convert class labels from scalars to one-hot vectors."""
      num_labels = labels_dense.shape[0]
      index_offset = numpy.arange(num_labels) * num_classes
      labels_one_hot = numpy.zeros((num_labels, num_classes))
      labels_one_hot.flat[index_offset + labels_dense.ravel()] = 1
      return labels_one_hot
    

    You can also do this in scikitlearn.

    0 讨论(0)
  • 2020-12-04 14:29

    A simple and short way to one-hot encode any integer or list of intergers:

    a = 5 
    b = [1, 2, 3]
    # one hot an integer
    one_hot_a = tf.nn.embedding_lookup(np.identity(10), a)
    # one hot a list of integers
    one_hot_b = tf.nn.embedding_lookup(np.identity(max(b)+1), b)
    
    0 讨论(0)
  • 2020-12-04 14:30

    As mentioned above by @dga, Tensorflow has tf.one_hot now:

    labels = tf.constant([5,3,2,4,1])
    highest_label = tf.reduce_max(labels)
    labels_one_hot = tf.one_hot(labels, highest_label + 1)
    
    array([[ 0.,  0.,  0.,  0.,  0.,  1.],
           [ 0.,  0.,  0.,  1.,  0.,  0.],
           [ 0.,  0.,  1.,  0.,  0.,  0.],
           [ 0.,  0.,  0.,  0.,  1.,  0.],
           [ 0.,  1.,  0.,  0.,  0.,  0.]], dtype=float32)
    

    You need to specify depth, otherwise you'll get a pruned one-hot tensor.

    If you like to do it manually:

    labels = tf.constant([5,3,2,4,1])
    size = tf.shape(labels)[0]
    highest_label = tf.reduce_max(labels)
    labels_t = tf.reshape(labels, [-1, 1])
    indices = tf.reshape(tf.range(size), [-1, 1])
    idx_with_labels = tf.concat([indices, labels_t], 1)
    labels_one_hot = tf.sparse_to_dense(idx_with_labels, [size, highest_label + 1], 1.0)
    
    array([[ 0.,  0.,  0.,  0.,  0.,  1.],
           [ 0.,  0.,  0.,  1.,  0.,  0.],
           [ 0.,  0.,  1.,  0.,  0.,  0.],
           [ 0.,  0.,  0.,  0.,  1.,  0.],
           [ 0.,  1.,  0.,  0.,  0.,  0.]], dtype=float32)
    

    Note arguments order in tf.concat()

    0 讨论(0)
  • 2020-12-04 14:31

    numpy does it!

    import numpy as np
    np.eye(n_labels)[target_vector]
    
    0 讨论(0)
  • 2020-12-04 14:35

    Tensorflow 2.0 Compatible Answer: You can do it efficiently using Tensorflow Transform.

    Code for performing One-Hot Encoding using Tensorflow Transform is shown below:

    def get_feature_columns(tf_transform_output):
      """Returns the FeatureColumns for the model.
    
      Args:
        tf_transform_output: A `TFTransformOutput` object.
    
      Returns:
        A list of FeatureColumns.
      """
      # Wrap scalars as real valued columns.
      real_valued_columns = [tf.feature_column.numeric_column(key, shape=())
                             for key in NUMERIC_FEATURE_KEYS]
    
      # Wrap categorical columns.
      one_hot_columns = [
          tf.feature_column.categorical_column_with_vocabulary_file(
              key=key,
              vocabulary_file=tf_transform_output.vocabulary_file_by_name(
                  vocab_filename=key))
          for key in CATEGORICAL_FEATURE_KEYS]
    
      return real_valued_columns + one_hot_columns
    

    For more information, refer this Tutorial on TF_Transform.

    0 讨论(0)
提交回复
热议问题