Hint HINT_PASS_DISTINCT_THROUGH reduces the amount of Entities returned per page for a PageRequest down to below the configured page size (PostgreSQL)

前端 未结 2 1078
隐瞒了意图╮
隐瞒了意图╮ 2021-01-16 00:53

I\'m setting up a JPA Specification based repository implementation that utilizes jpa specifications(constructed based on RSQL filter strings) to filter the results, define

相关标签:
2条回答
  • 2021-01-16 01:40

    The problem you are experimenting have to do with the way you are using the HINT_PASS_DISTINCT_THROUGH hint.

    This hint allows you to indicate Hibernate that the DISTINCT keyword should not be used in the SELECT statement issued against the database.

    You are taking advantage of this fact to allow your queries to be sorted by a field that is not included in the DISTINCT column list.

    But that is not how this hint should be used.

    This hint only must be used when you are sure that there will be no difference between applying or not a DISTINCT keyword to the SQL SELECT statement, because the SELECT statement already will fetch all the distinct values per se. The idea is improve the performance of the query avoiding the use of an unnecessary DISTINCT statement.

    This is usually what will happen when you use the query.distinct method in you criteria queries, and you are join fetching child relationships. This great article of @VladMihalcea explain how the hint works in detail.

    On the other hand, when you use paging, it will set OFFSET and LIMIT - or something similar, depending on the underlying database - in the SQL SELECT statement issued against the database, limiting to a maximum number of results your query.

    As stated, if you use the HINT_PASS_DISTINCT_THROUGH hint, the SELECT statement will not contain the DISTINCT keyword and, because of your joins, it could potentially give duplicate records of your main entity. This records will be processed by Hibernate to differentiate duplicates, because you are using query.distinct, and it will in fact remove duplicates if needed. I think this is the reason why you may get less records than requested in your Pageable.

    If you remove the hint, as the DISTINCT keyword is passed in the SQL statement which is sent to the database, as far as you only project information of the main entity, it will fetch all the records indicated by LIMIT and this is why it will give you always the requested number of records.

    You can try and fetch join your child entities (instead of only join with them). It will eliminate the problem of not being able to use the field you need to sort by in the columns of the DISTINCT keyword and, in addition, you will be able to apply, now legitimately, the hint.

    But if you do so it will you another problem: if you use join fetch and pagination, to return the main entities and its collections, Hibernate will no longer apply pagination at database level - it will no include OFFSET or LIMIT keywords in the SQL statement, and it will try to paginate the results in memory. This is the famous Hibernate HHH000104 warning:

    HHH000104: firstResult/maxResults specified with collection fetch; applying in memory!
    

    @VladMihalcea explain that in great detail in the last part of this article.

    He also proposed one possible solution to your problem, Window Functions.

    In you use case, instead of using Specifications, the idea is that you implement your own DAO. This DAO only need to have access to the EntityManager, which is not a great deal as you can inject your @PersistenceContext:

    @PersistenceContext
    protected EntityManager em;
    

    Once you have this EntityManager, you can create native queries and use window functions to build, based on the provided Pageable information, the right SQL statement that will be issued against the database. This will give you a lot of more freedom about what fields use for sorting or whatever you need.

    As the last cited article indicates, Window Functions is a feature supported by all mayor databases.

    In the case of PostgreSQL, you can easily come across them in the official documentation.

    Finally, one more option, suggested in fact by @nickshoe, and explained in great detail in the article he cited, is to perform the sorting and paging process in two phases: in the first phase, you need to create a query that will reference your child entities and in which you will apply paging and sorting. This query will allow you to identify the ids of the main entities that will be used, in the second phase of the process, to obtain the main entities themselves.

    You can take advantage of the aforementioned custom DAO to accomplish this process.

    0 讨论(0)
  • 2021-01-16 01:43

    It may be an off-topic answer, but it may help you.

    You could try to tackle this problem (pagination of parent-child entities) by separating the query in two parts:

    • a query for retrieving the ids that match the given criteria
    • a query for retrieving the actual entities by the resulting ids of the previous query

    I came across this solution in this blog post: https://vladmihalcea.com/fix-hibernate-hhh000104-entity-fetch-pagination-warning-message/

    0 讨论(0)
提交回复
热议问题