Is there a way to add a document to the index by supplying terms and term frequencies directly, rather than via Analysis and/or TokenStream? I ask because I want to model some data where I know the term frequencies, but there is no underlying text document to be analyzed. I could create one by repeating the same term many times (I don't care about positions or highlighting in this case, either, just scoring), but that seems a bit perverse (and probably slower than just supplying the counts directly).
(also asked on the mailing list)
At any rate, you don't need to pass everything through an Analyzer in order to create the document. I'm not aware of any way to pass in Terms and Frequencies as you've asked (though I'd be interested to know if you find a good approach to it), but you can certainly pass in IndexableFields
one term at a time. That would still require you to add each term multiple times, like:
IndexableField field = new StringField(fieldName, myTerm, FieldType.TYPE_NOT_STORED);
for (int i = 0; i < frequency; i++) {
document.add(field);
}
You can also take a step further back, and cut the Document
class out entirely, by using any Iterable<IndexableField>
, a simple List
, for instance, which might suffice for a more direct approach for modelling your data.
Not sure if that gets you any closer to what you are looking for, but perhaps a step vaguely in the right direction.
来源:https://stackoverflow.com/questions/17432365/can-i-insert-a-document-into-lucene-without-generating-a-tokenstream