How to get all results from solr query?

后端 未结 8 840
温柔的废话
温柔的废话 2021-01-01 10:36

I executed some query like \"Address:Jack*\". It show numFound = 5214 and display 100 documents in results page(I changed default display results f

相关标签:
8条回答
  • 2021-01-01 11:14

    What you should do is to first create a SolrQuery shown below and set the number of documents you want to fetch in a batch.

    int lastResult=0; //this is for processing the future batch
    
    String query = "id:[ lastResult TO *]"; // just considering id for the sake of simplicity
    
    SolrQuery solrQuery = new SolrQuery(query).setRows(500); //setRows will set the required batch, you can change this to whatever size you want.
    
    SolrDocumentList results = solrClient.query(solrQuery).getResults(); //execute this statement
    

    Here I am considering an example of search by id, you can replace it with any of your parameter to search upon.

    The "lastResult" is the variable you can change after execution of the first 500 records(500 is the batch size) and set it to the last id got from the results.

    This will help you execute the next batch starting with last result from previous batch.

    Hope this helps. Shoot up a comment below if you need any clarification.

    0 讨论(0)
  • 2021-01-01 11:14

    query.setRows(Integer.MAX_VALUE); works for me!!

    0 讨论(0)
  • 2021-01-01 11:18

    As the other answers pointed out, you can configure the rows to be max integer to yield back all the results for a query. I would recommend though to use Solr feature of pagination, and build a function that will return for you all the results using the cursorMark API. The gist of it is you set the cursorMark parameter to '*', you set the page size(rows parameter), and on each result you'll get a cursorMark for the next page, so you execute the same query only with the cursorMark given from the last result. This way you'll have more flexibility on how much of the results you want back, in a much more performant way.

    0 讨论(0)
  • 2021-01-01 11:22

    The way I dealt with the problem is by running the query twice:

    // Start with your (usually small) default page size
    solrQuery.setRows(50); 
    QueryResponse response = solrResponse(query);
    if (response.getResults().getNumFound() > 50) {
        solrQuery.setRows(response.getResults().getNumFound()); 
        response = solrResponse(query);
    }
    

    It makes a call twice to Solr, but gets you all matching records....with the small performance penalty.

    0 讨论(0)
  • 2021-01-01 11:23

    I suggest to use Deep Paging.

    Simple Pagination is a easy thing when you have few documents to read and all you have to do is play with start and rows parameters. But this is not a feasible way when you have many documents, I mean hundreds of thousands or even millions.
    This is the kind of thing that could bring your Solr server to their knees.

    For typical applications displaying search results to a human user, this tends to not be much of an issue since most users don’t care about drilling down past the first handful of pages of search results — but for automated systems that want to crunch data about all of the documents matching a query, it can be seriously prohibitive.

    This means that if you have a website and are paging search results, a real user do not go so further but consider on the other hand what can happen if a spider or a scraper try to read all the website pages.

    Now we are talking of Deep Paging.

    I’ll suggest to read this amazing post:

    https://lucidworks.com/blog/2013/12/12/coming-soon-to-solr-efficient-cursor-based-iteration-of-large-result-sets/

    And take a look at this document page:

    https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results

    And here is an example that try to explain how to paginate using the cursors.

    SolrQuery solrQuery = new SolrQuery();
    solrQuery.setRows(500);
    solrQuery.setQuery("*:*");
    solrQuery.addSort("id", ORDER.asc);  // Pay attention to this line
    String cursorMark = CursorMarkParams.CURSOR_MARK_START;
    boolean done = false;
    while (!done) {
        solrQuery.set(CursorMarkParams.CURSOR_MARK_PARAM, cursorMark);
        QueryResponse rsp = solrClient.query(solrQuery);
        String nextCursorMark = rsp.getNextCursorMark();
        for (SolrDocument d : rsp.getResults()) {
                ... 
        }
        if (cursorMark.equals(nextCursorMark)) {
            done = true;
        }
        cursorMark = nextCursorMark;
    }
    
    0 讨论(0)
  • 2021-01-01 11:28

    Returning all the results is never a good option as It would be very slow in performance.
    Can you mention your use case ?

    Also, Solr rows parameter helps you to tune the number of the results to be returned.
    However, I don't think there is a way to tune rows to return all results. It doesn't take a -1 as value.
    So you would need to set a high value for all the results to be returned.

    0 讨论(0)
提交回复
热议问题