发表新帖

发表新帖

What does Keras Tokenizer method exactly do?

前端未结

关注

 3  1175

别跟我提以往 2021-01-30 00:35

On occasion, circumstances require us to do the following:

from keras.preprocessing.text import Tokenizer
tokenizer = Tokenizer(num_words=my_max)

3条回答

无人及你 (楼主)

2021-01-30 01:10
Lets see what this line of code does.

tokenizer.fit_on_texts(text)

For example, consider the sentence " The earth is an awesome place live"

tokenizer.fit_on_texts("The earth is an awesome place live") fits [[1,2,3,4,5,6,7]] where 3 -> "is" , 6 -> "place", so on.
```
sequences = tokenizer.texts_to_sequences("The earth is an great place live")
```
returns [[1,2,3,4,6,7]].

You see what happened here. The word "great" is not fit initially, so it does not recognize the word "great". Meaning, fit_on_text can be used independently on train data and then the fitted vocabulary index can be used to represent a completely new set of word sequence. These are two different processes. Hence the two lines of code.
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...

热议问题