Can I insert a Document into Lucene without generating a TokenStream?

拜拜、爱过 提交于 2019-12-08 03:00:26

问题


Is there a way to add a document to the index by supplying terms and term frequencies directly, rather than via Analysis and/or TokenStream? I ask because I want to model some data where I know the term frequencies, but there is no underlying text document to be analyzed. I could create one by repeating the same term many times (I don't care about positions or highlighting in this case, either, just scoring), but that seems a bit perverse (and probably slower than just supplying the counts directly).

(also asked on the mailing list)


回答1:


At any rate, you don't need to pass everything through an Analyzer in order to create the document. I'm not aware of any way to pass in Terms and Frequencies as you've asked (though I'd be interested to know if you find a good approach to it), but you can certainly pass in IndexableFields one term at a time. That would still require you to add each term multiple times, like:

IndexableField field = new StringField(fieldName, myTerm, FieldType.TYPE_NOT_STORED);
for (int i = 0; i < frequency; i++) {
    document.add(field);
}

You can also take a step further back, and cut the Document class out entirely, by using any Iterable<IndexableField>, a simple List, for instance, which might suffice for a more direct approach for modelling your data.

Not sure if that gets you any closer to what you are looking for, but perhaps a step vaguely in the right direction.



来源:https://stackoverflow.com/questions/17432365/can-i-insert-a-document-into-lucene-without-generating-a-tokenstream

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!