I don't have time to post code right now, but please take a look at the opennlp sentence chunker and document categorizer. I think you could -very creatively- use the doccat to establish your "key" and the sentence chunker to establish noun and verb phrases (not tokens, but actualy multi word phrases) and combine the results. So at query time, you would categorize the sentence to establish a key, then chunk the sentence, then do a query that joins to the keys table and then fuzzily (full text index maybe) the phrases table. Just a thought, if it's interesting I'll post code as an edit. you would have to build the doccat model using samples.
EDIT
Here is how to get a probability dist over a set of categories using the opennlp document categorizer, youll need to supply a properties file that has the path to the doccat model:
import java.io.File;
import java.io.IOException;
import java.util.HashMap;
import java.util.Map;
import java.util.Properties;
import opennlp.tools.doccat.DoccatModel;
import opennlp.tools.doccat.DocumentCategorizerME;
/**
*
* @author Owner
*/
public class SentimentFinder {
private DoccatModel doccatModel;
private DocumentCategorizerME documentCategorizerME;
Properties props =null;
public void init() {
try {
if (doccatModel == null) {
doccatModel = new DoccatModel(new File(props.getProperty("opennlp.sentiment.model.generic")));
documentCategorizerME = new DocumentCategorizerME(doccatModel);
}
} catch (IOException ex) {
ex.printStackTrace();
}
}
/**
* Classifies text via a maxent model. Try to keep chunks of text small, or
* typically there will be all low scores with little difference.
*
* @param text the string to be classified
* @return
*/
public Map<String, Double> probDist(String text) {
Map<String, Double> probDist = new HashMap<String, Double>();
if (doccatModel == null) {
init();
}
double[] categorize = documentCategorizerME.categorize(text);
int catSize = documentCategorizerME.getNumberOfCategories();
for (int i = 0; i < catSize; i++) {
String category = documentCategorizerME.getCategory(i);
probDist.put(category, categorize[documentCategorizerME.getIndex(category)]);
}
return probDist;
}
}
And here's how to chunk the sentence with a sentence chunker and get noun phrases
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.util.HashMap;
import opennlp.tools.chunker.ChunkerME;
import opennlp.tools.chunker.ChunkerModel;
import opennlp.tools.postag.POSModel;
import opennlp.tools.postag.POSTaggerME;
import opennlp.tools.tokenize.TokenizerME;
import opennlp.tools.tokenize.TokenizerModel;
import opennlp.tools.util.Span;
/**
*
* Extracts noun phrases from a sentence. To create sentences using OpenNLP use
* the SentenceDetector classes.
*/
public class OpenNLPNounPhraseExtractor {
static final int N = 2;
public static void main(String[] args) {
try {
HashMap<String, Integer> termFrequencies = new HashMap<>();
String modelPath = "c:\\temp\\opennlpmodels\\";
TokenizerModel tm = new TokenizerModel(new FileInputStream(new File(modelPath + "en-token.zip")));
TokenizerME wordBreaker = new TokenizerME(tm);
POSModel pm = new POSModel(new FileInputStream(new File(modelPath + "en-pos-maxent.zip")));
POSTaggerME posme = new POSTaggerME(pm);
InputStream modelIn = new FileInputStream(modelPath + "en-chunker.zip");
ChunkerModel chunkerModel = new ChunkerModel(modelIn);
ChunkerME chunkerME = new ChunkerME(chunkerModel);
//this is your sentence
String sentence = "Barack Hussein Obama II is the 44th awesome President of the United States, and the first African American to hold the office.";
//words is the tokenized sentence
String[] words = wordBreaker.tokenize(sentence);
//posTags are the parts of speech of every word in the sentence (The chunker needs this info of course)
String[] posTags = posme.tag(words);
//chunks are the start end "spans" indices to the chunks in the words array
Span[] chunks = chunkerME.chunkAsSpans(words, posTags);
//chunkStrings are the actual chunks
String[] chunkStrings = Span.spansToStrings(chunks, words);
for (int i = 0; i < chunks.length; i++) {
String np = chunkStrings[i];
if (chunks[i].getType().equals("NP")) {
if (termFrequencies.containsKey(np)) {
termFrequencies.put(np, termFrequencies.get(np) + 1);
} else {
termFrequencies.put(np, 1);
}
}
}
System.out.println(termFrequencies);
} catch (IOException e) {
}
}
}
so what I was thinking is to classify the input text, and extract and store noun phrases, you could then, at query time, classify the input, get a category, then do something like this in SQL
select * from categories a inner join nounphrases b on a.id = b.catid where catname = @thecatIjustgotfromtheclassifier and contains(text,'search term')
or something like that