I'm trying to do some text analysis to determine if a given string is... talking about politics. I'm thinking I could create a neural network where the input is either a string or a list of words (ordering might matter?) and the output is whether the string is about politics.
However the brain.js library only takes inputs of a number between 0 and 1 or an array of numbers between 0 and 1. How can I coerce my data in such a way that I can achieve the task?
new brain.recurrent.LSTM();
this does the trick for you.
Example,
var brain = require('brain.js')
var net = new brain.recurrent.LSTM();
net.train([
{input: "my unit-tests failed.", output: "software"},
{input: "tried the program, but it was buggy.", output: "software"},
{input: "i need a new power supply.", output: "hardware"},
{input: "the drive has a 2TB capacity.", output: "hardware"},
{input: "unit-tests", output: "software"},
{input: "program", output: "software"},
{input: "power supply", output: "hardware"},
{input: "drive", output: "hardware"},
]);
console.log("output = "+net.run("drive"));
output = hardware
refer to this link=> https://github.com/BrainJS/brain.js/issues/65 this has clear explanation and usage of brain.recurrent.LSTM()
You need to come up with the model to convert your data to a list of tuples [input, expected_output]
, where input
is a list of numbers between 0 and 1 representing the given words, and output
is one number between 0 and 1 representing how close the sentence is to your objective analysis (being political). For example, for the sentence "The quick brown cat jumped over the lazy dog" you might want to give a score of zero. A sentence like "President shakes off corruption scandal" you might want to give a score very close to one.
As you can see, your biggest challenge is actually obtaining the data and cleaning it. Converting it to the training format is easy, you could just hash words into numbers between 0 and 1, and make sure to handle different casing, punctuation, and you might want to step words to get the best results.
One more thing, you can use a term relevance algorithm to rank the importance of words in your training data set, so that you can choose only the top k
relevant words in a sentence, since you need uniform data size for each sentence.
So apparently text doesn't coerce very well to NN input.
A Naive Bayes Classifier looks like exactly what I want. https://github.com/harthur/classifier
来源:https://stackoverflow.com/questions/37043598/use-brain-js-neural-network-to-do-text-analysis