How to convert predicted sequence back to text in keras?

前端 未结 5 602
花落未央
花落未央 2021-01-01 13:39

I have a sequence to sequence learning model which works fine and able to predict some outputs. The problem is I have no idea how to convert the output back to text sequence

相关标签:
5条回答
  • 2021-01-01 14:20

    Here is a solution I found:

    reverse_word_map = dict(map(reversed, tokenizer.word_index.items()))
    
    0 讨论(0)
  • 2021-01-01 14:20
        p_test = model.predict(data_test).argmax(axis =1)
    
    #Show some misclassified examples
    misclassified_idx = np.where(p_test != Ytest)[0]
    len(misclassified_idx) 
    i= np.random.choice(misclassified_idx)
    print((i))
    print((df_test[i]))
    print('True label %s Predicted label %s' , (Ytest[i], p_test[i]))
    
    df_test is the original text
    data_test is sequence of integer 
    
    0 讨论(0)
  • 2021-01-01 14:22

    You can use directly the inverse tokenizer.sequences_to_texts function.

    text = tokenizer.sequences_to_texts(<list of the integer equivalent encodings>)

    I have tested the above and it works as expected.

    PS.: Take extra care to make the argument be the list of the integer encodings and not the One Hot ones.

    0 讨论(0)
  • 2021-01-01 14:28

    You can make the dictionary that map index back to character.

    index_word = {v: k for k, v in tk.word_index.items()} # map back
    seqs = tk.texts_to_sequences(txt1)
    words = []
    for seq in seqs:
        if len(seq):
            words.append(index_word.get(seq[0]))
        else:
            words.append(' ')
    print(''.join(words)) # output
    
    >>> 'what makes this problem difficult is that the sequences can vary in length  
    >>> be comprised of a very large vocabulary of input symbols and may require the model  
    >>> to learn the long term context or dependencies between symbols in the input sequence '
    

    However, in the question, you're trying to use sequence of characters to predict output of 10 classes which is not the sequence to sequence model. In this case, you cannot just turn prediction (or pred.argmax(axis=1)) back to sequence of characters.

    0 讨论(0)
  • 2021-01-01 14:30

    I had to resolve the same problem, so here is how I ended up doing it (inspired by @Ben Usemans reversed dictionary).

    # Importing library
    from keras.preprocessing.text import Tokenizer
    
    # My texts
    texts = ['These are two crazy sentences', 'that I want to convert back and forth']
    
    # Creating a tokenizer
    tokenizer = Tokenizer(lower=True)
    
    # Building word indices
    tokenizer.fit_on_texts(texts)
    
    # Tokenizing sentences
    sentences = tokenizer.texts_to_sequences(texts)
    
    >sentences
    >[[1, 2, 3, 4, 5], [6, 7, 8, 9, 10, 11, 12, 13]]
    
    # Creating a reverse dictionary
    reverse_word_map = dict(map(reversed, tokenizer.word_index.items()))
    
    # Function takes a tokenized sentence and returns the words
    def sequence_to_text(list_of_indices):
        # Looking up words in dictionary
        words = [reverse_word_map.get(letter) for letter in list_of_indices]
        return(words)
    
    # Creating texts 
    my_texts = list(map(sequence_to_text, sentences))
    
    >my_texts
    >[['these', 'are', 'two', 'crazy', 'sentences'], ['that', 'i', 'want', 'to', 'convert', 'back', 'and', 'forth']]
    
    0 讨论(0)
提交回复
热议问题