What does Keras Tokenizer method exactly do?

前端 未结 3 1175
别跟我提以往
别跟我提以往 2021-01-30 00:35

On occasion, circumstances require us to do the following:

from keras.preprocessing.text import Tokenizer
tokenizer = Tokenizer(num_words=my_max)
3条回答
  •  无人及你
    2021-01-30 01:10

    Lets see what this line of code does.

    tokenizer.fit_on_texts(text)

    For example, consider the sentence " The earth is an awesome place live"

    tokenizer.fit_on_texts("The earth is an awesome place live") fits [[1,2,3,4,5,6,7]] where 3 -> "is" , 6 -> "place", so on.

    sequences = tokenizer.texts_to_sequences("The earth is an great place live")
    

    returns [[1,2,3,4,6,7]].

    You see what happened here. The word "great" is not fit initially, so it does not recognize the word "great". Meaning, fit_on_text can be used independently on train data and then the fitted vocabulary index can be used to represent a completely new set of word sequence. These are two different processes. Hence the two lines of code.

提交回复
热议问题