Best approach for doing full-text search with list-of-integers documents

前端 未结 2 1144
悲&欢浪女
悲&欢浪女 2021-01-16 00:27

I\'m working on a C++/Qt image retrieval system based on similarity that works as follows (I\'ll try to avoid irrelevant or off-topic details):

I take a collection o

2条回答
  •  北恋
    北恋 (楼主)
    2021-01-16 00:48

    It sounds to me like you have a vectorspace model, so Lucene or a similar product may work well for you. In general, an inverted-index model will be good if:

    1. You don't know the number of classes in advance
    2. There are a lot of classes relative to the number of images

    If your problem doesn't fit these criteria, a normal relational DB might work better, as Thomas suggested. If it meets #1 but not #2, you could investigate one of the "column oriented" non-relational databases. I'm not familiar enough with these to tell you how well they would work, but my intuition is that you'll need to replicate a lot of the functionality in an IR toolkit yourself.

    Lucene is written in Java and I don't know of any C++ ports. Solr exposes Lucene as a web service, so it's easy enough to access it that way from whatever language you choose.

    I don't know much about Lemur, but it looks like it has a similar vectorspace model, and it's written in C++, so that might be easier for you to use.

提交回复
热议问题