I\'m setting up a JPA Specification based repository implementation that utilizes jpa specifications(constructed based on RSQL filter strings) to filter the results, define
The problem you are experimenting have to do with the way you are using the HINT_PASS_DISTINCT_THROUGH
hint.
This hint allows you to indicate Hibernate that the DISTINCT
keyword should not be used in the SELECT
statement issued against the database.
You are taking advantage of this fact to allow your queries to be sorted by a field that is not included in the DISTINCT
column list.
But that is not how this hint should be used.
This hint only must be used when you are sure that there will be no difference between applying or not a DISTINCT
keyword to the SQL SELECT
statement, because the SELECT
statement already will fetch all the distinct values per se. The idea is improve the performance of the query avoiding the use of an unnecessary DISTINCT
statement.
This is usually what will happen when you use the query.distinct
method in you criteria queries, and you are join fetching
child relationships. This great article of @VladMihalcea explain how the hint works in detail.
On the other hand, when you use paging, it will set OFFSET
and LIMIT
- or something similar, depending on the underlying database - in the SQL SELECT
statement issued against the database, limiting to a maximum number of results your query.
As stated, if you use the HINT_PASS_DISTINCT_THROUGH
hint, the SELECT
statement will not contain the DISTINCT
keyword and, because of your joins, it could potentially give duplicate records of your main entity. This records will be processed by Hibernate to differentiate duplicates, because you are using query.distinct
, and it will in fact remove duplicates if needed. I think this is the reason why you may get less records than requested in your Pageable
.
If you remove the hint, as the DISTINCT
keyword is passed in the SQL statement which is sent to the database, as far as you only project information of the main entity, it will fetch all the records indicated by LIMIT
and this is why it will give you always the requested number of records.
You can try and fetch join
your child entities (instead of only join
with them). It will eliminate the problem of not being able to use the field you need to sort by in the columns of the DISTINCT
keyword and, in addition, you will be able to apply, now legitimately, the hint.
But if you do so it will you another problem: if you use join fetch and pagination, to return the main entities and its collections, Hibernate will no longer apply pagination at database level - it will no include OFFSET
or LIMIT
keywords in the SQL statement, and it will try to paginate the results in memory. This is the famous Hibernate HHH000104
warning:
HHH000104: firstResult/maxResults specified with collection fetch; applying in memory!
@VladMihalcea explain that in great detail in the last part of this article.
He also proposed one possible solution to your problem, Window Functions.
In you use case, instead of using Specification
s, the idea is that you implement your own DAO. This DAO only need to have access to the EntityManager
, which is not a great deal as you can inject your @PersistenceContext
:
@PersistenceContext
protected EntityManager em;
Once you have this EntityManager
, you can create native queries and use window functions to build, based on the provided Pageable
information, the right SQL statement that will be issued against the database. This will give you a lot of more freedom about what fields use for sorting or whatever you need.
As the last cited article indicates, Window Functions is a feature supported by all mayor databases.
In the case of PostgreSQL, you can easily come across them in the official documentation.
Finally, one more option, suggested in fact by @nickshoe, and explained in great detail in the article he cited, is to perform the sorting and paging process in two phases: in the first phase, you need to create a query that will reference your child entities and in which you will apply paging and sorting. This query will allow you to identify the ids of the main entities that will be used, in the second phase of the process, to obtain the main entities themselves.
You can take advantage of the aforementioned custom DAO to accomplish this process.
It may be an off-topic answer, but it may help you.
You could try to tackle this problem (pagination of parent-child entities) by separating the query in two parts:
I came across this solution in this blog post: https://vladmihalcea.com/fix-hibernate-hhh000104-entity-fetch-pagination-warning-message/