Use string as input in Keras IMDB example

不羁岁月 提交于 2020-01-04 05:51:30

问题


I was looking at the Keras IMDB Movie reviews sentiment classification example (and the corresponding model on github), which learns to decide whether a review is positive or negative.

The data has been preprocessed such that each review is encoded as a sequence of integers, e.g. the review "This movie is awesome!" would be [11, 17, 6, 1187] and for this input the model gives the output 'positive'.

The dataset also makes available the word index used for encoding the sequences, i.e. I know the map

This: 11
movie: 17
is: 6
awesome: 1187
...

Can I somehow include this knowledge into the model such that its input is a string, i.e. it gives a prediction based on the input "This movie is awesome!"?


回答1:


First up, the input to the neural network is never a string, it's exactly a list of indices of words (or characters) in a vocabulary. And the first thing the model usually does is embedding transformation (see the example) which further converts these indices into the (trainable) float vectors.

What you really mean is data pre-processing step that transforms the raw input from the user (can be text, image pixels, sound recording, etc) into a format that is suitable and convenient for the model. Data pre-processing is an essential part of the machine-learning application just like the model itself, and should be stored separately. If you intend to work with imdb dataset, the vocabulary is already pre-processed. You can call imdb.get_word_index() in keras to get the word index or you can work with the vocabulary json file directly.



来源:https://stackoverflow.com/questions/50250731/use-string-as-input-in-keras-imdb-example

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!