What does tf.nn.embedding_lookup function do?

前端 未结 8 511
深忆病人
深忆病人 2020-12-02 04:18
tf.nn.embedding_lookup(params, ids, partition_strategy=\'mod\', name=None)

I cannot understand the duty of this function. Is it like a lookup table

相关标签:
8条回答
  • 2020-12-02 04:27

    Since I was also intrigued by this function, I'll give my two cents.

    The way I see it in the 2D case is just as a matrix multiplication (it's easy to generalize to other dimensions).

    Consider a vocabulary with N symbols. Then, you can represent a symbol x as a vector of dimensions Nx1, one-hot-encoded.

    But you want a representation of this symbol not as a vector of Nx1, but as one with dimensions Mx1, called y.

    So, to transform x into y, you can use and embedding matrix E, with dimensions MxN:

    y = E x.

    This is essentially what tf.nn.embedding_lookup(params, ids, ...) is doing, with the nuance that ids are just one number that represents the position of the 1 in the one-hot-encoded vector x.

    0 讨论(0)
  • 2020-12-02 04:29

    embedding_lookup function retrieves rows of the params tensor. The behavior is similar to using indexing with arrays in numpy. E.g.

    matrix = np.random.random([1024, 64])  # 64-dimensional embeddings
    ids = np.array([0, 5, 17, 33])
    print matrix[ids]  # prints a matrix of shape [4, 64] 
    

    params argument can be also a list of tensors in which case the ids will be distributed among the tensors. For example, given a list of 3 tensors [2, 64], the default behavior is that they will represent ids: [0, 3], [1, 4], [2, 5].

    partition_strategy controls the way how the ids are distributed among the list. The partitioning is useful for larger scale problems when the matrix might be too large to keep in one piece.

    0 讨论(0)
  • 2020-12-02 04:34

    Yes, this function is hard to understand, until you get the point.

    In its simplest form, it is similar to tf.gather. It returns the elements of params according to the indexes specified by ids.

    For example (assuming you are inside tf.InteractiveSession())

    params = tf.constant([10,20,30,40])
    ids = tf.constant([0,1,2,3])
    print tf.nn.embedding_lookup(params,ids).eval()
    

    would return [10 20 30 40], because the first element (index 0) of params is 10, the second element of params (index 1) is 20, etc.

    Similarly,

    params = tf.constant([10,20,30,40])
    ids = tf.constant([1,1,3])
    print tf.nn.embedding_lookup(params,ids).eval()
    

    would return [20 20 40].

    But embedding_lookup is more than that. The params argument can be a list of tensors, rather than a single tensor.

    params1 = tf.constant([1,2])
    params2 = tf.constant([10,20])
    ids = tf.constant([2,0,2,1,2,3])
    result = tf.nn.embedding_lookup([params1, params2], ids)
    

    In such a case, the indexes, specified in ids, correspond to elements of tensors according to a partition strategy, where the default partition strategy is 'mod'.

    In the 'mod' strategy, index 0 corresponds to the first element of the first tensor in the list. Index 1 corresponds to the first element of the second tensor. Index 2 corresponds to the first element of the third tensor, and so on. Simply index i corresponds to the first element of the (i+1)th tensor , for all the indexes 0..(n-1), assuming params is a list of n tensors.

    Now, index n cannot correspond to tensor n+1, because the list params contains only n tensors. So index n corresponds to the second element of the first tensor. Similarly, index n+1 corresponds to the second element of the second tensor, etc.

    So, in the code

    params1 = tf.constant([1,2])
    params2 = tf.constant([10,20])
    ids = tf.constant([2,0,2,1,2,3])
    result = tf.nn.embedding_lookup([params1, params2], ids)
    

    index 0 corresponds to the first element of the first tensor: 1

    index 1 corresponds to the first element of the second tensor: 10

    index 2 corresponds to the second element of the first tensor: 2

    index 3 corresponds to the second element of the second tensor: 20

    Thus, the result would be:

    [ 2  1  2 10  2 20]
    
    0 讨论(0)
  • 2020-12-02 04:39

    Adding to Asher Stern's answer, params is interpreted as a partitioning of a large embedding tensor. It can be a single tensor representing the complete embedding tensor, or a list of X tensors all of same shape except for the first dimension, representing sharded embedding tensors.

    The function tf.nn.embedding_lookup is written considering the fact that embedding (params) will be large. Therefore we need partition_strategy.

    0 讨论(0)
  • 2020-12-02 04:43

    Yes, the purpose of tf.nn.embedding_lookup() function is to perform a lookup in the embedding matrix and return the embeddings (or in simple terms the vector representation) of words.

    A simple embedding matrix (of shape: vocabulary_size x embedding_dimension) would look like below. (i.e. each word will be represented by a vector of numbers; hence the name word2vec)


    Embedding Matrix

    the 0.418 0.24968 -0.41242 0.1217 0.34527 -0.044457 -0.49688 -0.17862
    like 0.36808 0.20834 -0.22319 0.046283 0.20098 0.27515 -0.77127 -0.76804
    between 0.7503 0.71623 -0.27033 0.20059 -0.17008 0.68568 -0.061672 -0.054638
    did 0.042523 -0.21172 0.044739 -0.19248 0.26224 0.0043991 -0.88195 0.55184
    just 0.17698 0.065221 0.28548 -0.4243 0.7499 -0.14892 -0.66786 0.11788
    national -1.1105 0.94945 -0.17078 0.93037 -0.2477 -0.70633 -0.8649 -0.56118
    day 0.11626 0.53897 -0.39514 -0.26027 0.57706 -0.79198 -0.88374 0.30119
    country -0.13531 0.15485 -0.07309 0.034013 -0.054457 -0.20541 -0.60086 -0.22407
    under 0.13721 -0.295 -0.05916 -0.59235 0.02301 0.21884 -0.34254 -0.70213
    such 0.61012 0.33512 -0.53499 0.36139 -0.39866 0.70627 -0.18699 -0.77246
    second -0.29809 0.28069 0.087102 0.54455 0.70003 0.44778 -0.72565 0.62309 
    

    I split the above embedding matrix and loaded only the words in vocab which will be our vocabulary and the corresponding vectors in emb array.

    vocab = ['the','like','between','did','just','national','day','country','under','such','second']
    
    emb = np.array([[0.418, 0.24968, -0.41242, 0.1217, 0.34527, -0.044457, -0.49688, -0.17862],
       [0.36808, 0.20834, -0.22319, 0.046283, 0.20098, 0.27515, -0.77127, -0.76804],
       [0.7503, 0.71623, -0.27033, 0.20059, -0.17008, 0.68568, -0.061672, -0.054638],
       [0.042523, -0.21172, 0.044739, -0.19248, 0.26224, 0.0043991, -0.88195, 0.55184],
       [0.17698, 0.065221, 0.28548, -0.4243, 0.7499, -0.14892, -0.66786, 0.11788],
       [-1.1105, 0.94945, -0.17078, 0.93037, -0.2477, -0.70633, -0.8649, -0.56118],
       [0.11626, 0.53897, -0.39514, -0.26027, 0.57706, -0.79198, -0.88374, 0.30119],
       [-0.13531, 0.15485, -0.07309, 0.034013, -0.054457, -0.20541, -0.60086, -0.22407],
       [ 0.13721, -0.295, -0.05916, -0.59235, 0.02301, 0.21884, -0.34254, -0.70213],
       [ 0.61012, 0.33512, -0.53499, 0.36139, -0.39866, 0.70627, -0.18699, -0.77246 ],
       [ -0.29809, 0.28069, 0.087102, 0.54455, 0.70003, 0.44778, -0.72565, 0.62309 ]])
    
    
    emb.shape
    # (11, 8)
    

    Embedding Lookup in TensorFlow

    Now we will see how can we perform embedding lookup for some arbitrary input sentence.

    In [54]: from collections import OrderedDict
    
    # embedding as TF tensor (for now constant; could be tf.Variable() during training)
    In [55]: tf_embedding = tf.constant(emb, dtype=tf.float32)
    
    # input for which we need the embedding
    In [56]: input_str = "like the country"
    
    # build index based on our `vocabulary`
    In [57]: word_to_idx = OrderedDict({w:vocab.index(w) for w in input_str.split() if w in vocab})
    
    # lookup in embedding matrix & return the vectors for the input words
    In [58]: tf.nn.embedding_lookup(tf_embedding, list(word_to_idx.values())).eval()
    Out[58]: 
    array([[ 0.36807999,  0.20834   , -0.22318999,  0.046283  ,  0.20097999,
             0.27515   , -0.77126998, -0.76804   ],
           [ 0.41800001,  0.24968   , -0.41242   ,  0.1217    ,  0.34527001,
            -0.044457  , -0.49687999, -0.17862   ],
           [-0.13530999,  0.15485001, -0.07309   ,  0.034013  , -0.054457  ,
            -0.20541   , -0.60086   , -0.22407   ]], dtype=float32)
    

    Observe how we got the embeddings from our original embedding matrix (with words) using the indices of words in our vocabulary.

    Usually, such an embedding lookup is performed by the first layer (called Embedding layer) which then passes these embeddings to RNN/LSTM/GRU layers for further processing.


    Side Note: Usually the vocabulary will also have a special unk token. So, if a token from our input sentence is not present in our vocabulary, then the index corresponding to unk will be looked up in the embedding matrix.


    P.S. Note that embedding_dimension is a hyperparameter that one has to tune for their application but popular models like Word2Vec and GloVe uses 300 dimension vector for representing each word.

    Bonus Reading word2vec skip-gram model

    0 讨论(0)
  • 2020-12-02 04:46

    Another way to look at it is , assume that you flatten out the tensors to one dimensional array, and then you are performing a lookup

    (eg) Tensor0=[1,2,3], Tensor1=[4,5,6], Tensor2=[7,8,9]

    The flattened out tensor will be as follows [1,4,7,2,5,8,3,6,9]

    Now when you do a lookup of [0,3,4,1,7] it will yeild [1,2,5,4,6]

    (i,e) if lookup value is 7 for example , and we have 3 tensors (or a tensor with 3 rows) then,

    7 / 3 : (Reminder is 1, Quotient is 2) So 2nd element of Tensor1 will be shown, which is 6

    0 讨论(0)
提交回复
热议问题