Use brain.js neural network to do text analysis

青春壹個敷衍的年華 提交于 2019-12-03 03:03:45
new brain.recurrent.LSTM(); 

this does the trick for you.

Example,

var brain = require('brain.js')
var net = new brain.recurrent.LSTM();
net.train([
  {input: "my unit-tests failed.", output: "software"},
  {input: "tried the program, but it was buggy.", output: "software"},
  {input: "i need a new power supply.", output: "hardware"},
  {input: "the drive has a 2TB capacity.", output: "hardware"},
  {input: "unit-tests", output: "software"},
  {input: "program", output: "software"},
  {input: "power supply", output: "hardware"},
  {input: "drive", output: "hardware"},
]);

console.log("output = "+net.run("drive"));


output = hardware

refer to this link=> https://github.com/BrainJS/brain.js/issues/65 this has clear explanation and usage of brain.recurrent.LSTM()

You need to come up with the model to convert your data to a list of tuples [input, expected_output], where input is a list of numbers between 0 and 1 representing the given words, and output is one number between 0 and 1 representing how close the sentence is to your objective analysis (being political). For example, for the sentence "The quick brown cat jumped over the lazy dog" you might want to give a score of zero. A sentence like "President shakes off corruption scandal" you might want to give a score very close to one.

As you can see, your biggest challenge is actually obtaining the data and cleaning it. Converting it to the training format is easy, you could just hash words into numbers between 0 and 1, and make sure to handle different casing, punctuation, and you might want to step words to get the best results.

One more thing, you can use a term relevance algorithm to rank the importance of words in your training data set, so that you can choose only the top k relevant words in a sentence, since you need uniform data size for each sentence.

So apparently text doesn't coerce very well to NN input.

A Naive Bayes Classifier looks like exactly what I want. https://github.com/harthur/classifier

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!