What are docValues in Solr? When should I use them?

前端 未结 3 1600
天涯浪人
天涯浪人 2021-02-07 12:38

So, I have read multiple sources that try to explain what \'docValues\' are in Solr, but I don\'t seem to understand when I should use them, especially in relation to indexed vs

3条回答
  •  借酒劲吻你
    2021-02-07 13:01

    Use cases of DocValues are already explained by @Persimmonium and are pretty clear. they are good for faceting and sorting and such fancy stuff in the IR world.

    What are docValue and why they are there ? docValue is nothing but a way to build a forward index so that documents point to values. they are built to overcome the limitations of FieldCache by providing a document to value mapping built at index time and they store values in a column based fashion and it does all the heavyweight lifting during document indexing.

    What docvalues are:

    NRT-compatible: These are per-segment datastructures built at index-time and designed to be efficient for the use case where data is changing rapidly.

    Basic query/filter support: You can do basic term, range, etc queries on docvalues fields without also indexing them, but these are constant-score only and typically slower. If you care about performance and scoring, index the field too.

    Better compression than fieldcache: Docvalues fields compress better than fieldcache, and "insanity" is impossible.

    Able to store data outside of heap memory: You can specify a different docValuesFormat on the fieldType (docValuesFormat="Disk") to only load minimal data on the heap, keeping other data structures on disk.

    What docvalues are not:

    Not a replacement for stored fields: These are unrelated to stored fields in every way and instead datastructures for search (sort/facet/group/join/scoring).

    Use case to use with Lucene docValues this way.

        public Bits getDocsWithField(FieldInfo field) throws IOException {
      switch(field.getDocValuesType()) {
        case SORTED_SET:
          return DocValues.docsWithValue(getSortedSet(field), maxDoc);
        case SORTED_NUMERIC:
          return DocValues.docsWithValue(getSortedNumeric(field), maxDoc);
        case SORTED:
          return DocValues.docsWithValue(getSorted(field), maxDoc);
        case BINARY:
          BinaryEntry be = binaries.get(field.number);
          return getMissingBits(be.missingOffset);
        case NUMERIC:
          NumericEntry ne = numerics.get(field.number);
          return getMissingBits(ne.missingOffset);
        default:
          throw new AssertionError();
      }
    }
    

提交回复
热议问题