SQL LIMIT vs. JDBC Statement setMaxRows. Which one is better?

后端未结

关注

 5  1709

离开以前

I want to select the Top 10 records for a given query. So, I can use one of the following options:

Using the JDBC Statement.setMaxRows() method
Using LIMI

相关标签:

5条回答

眼角桃花

2021-02-08 20:18
SQL-level LIMIT

To restrict the SQL query result set size, you can use the SQL:008 syntax:
```
SELECT title
FROM post
ORDER BY created_on DESC
OFFSET 50 ROWS
FETCH NEXT 50 ROWS ONLY
```
which works on Oracle 12, SQL Server 2012, or PostgreSQL 8.4 or newer versions.

For MySQL, you can use the LIMIT and OFFSET clauses:
```
SELECT title
FROM post
ORDER BY created_on DESC
LIMIT 50
OFFSET 50
```
The advantage of using the SQL-level pagination is that the database execution plan can use this information.

So, if we have an index on the created_on column:
```
CREATE INDEX idx_post_created_on ON post (created_on DESC)
```
And we execute the following query that uses the LIMIT clause:
```
EXPLAIN ANALYZE
SELECT title
FROM post
ORDER BY created_on DESC
LIMIT 50
```
We can see that the database engine uses the index since the optimizer knows that only 50 records are to be fetched:
```
Execution plan:
Limit  (cost=0.28..25.35 rows=50 width=564)
       (actual time=0.038..0.051 rows=50 loops=1)
  ->  Index Scan using idx_post_created_on on post p  
      (cost=0.28..260.04 rows=518 width=564) 
      (actual time=0.037..0.049 rows=50 loops=1)
Planning time: 1.511 ms
Execution time: 0.148 ms
```
JDBC Statement maxRows

According to the setMaxRows Javadoc:

If the limit is exceeded, the excess rows are silently dropped.

That's not very reassuring!

So, if we execute the following query on PostgreSQL:
```
try (PreparedStatement statement = connection
    .prepareStatement("""
        SELECT title
        FROM post
        ORDER BY created_on DESC
    """)
) {
    statement.setMaxRows(50);
    ResultSet resultSet = statement.executeQuery();
    int count = 0;
    while (resultSet.next()) {
        String title = resultSet.getString(1);
        count++;
    }
}
```
We get the following execution plan in the PostgreSQL log:
```
Execution plan:
  Sort  (cost=65.53..66.83 rows=518 width=564) 
        (actual time=4.339..5.473 rows=5000 loops=1)
  Sort Key: created_on DESC
  Sort Method: quicksort  Memory: 896kB
  ->  Seq Scan on post p  (cost=0.00..42.18 rows=518 width=564) 
                          (actual time=0.041..1.833 rows=5000 loops=1)
Planning time: 1.840 ms
Execution time: 6.611 ms 
```
Because the database optimizer has no idea that we need to fetch only 50 records, it assumes that all 5000 rows need to be scanned. If a query needs to fetch a large number of records, the cost of a full-table scan is actually lower than if an index is used, hence the execution plan will not use the index at all.

I ran this test on Oracle, SQL Server, PostgreSQL, and MySQL, and it looks like the Oracle and PostgreSQL optimizers don't use the maxRows setting when generating the execution plan.

However, on SQL Server and MySQL, the maxRows JDBC setting is taken into consideration, and the execution plan is equivalent to an SQL query that uses TOP or LIMIT. You can run the tests for yourself, as they are available in my High-Performance Java Persistence GitHub repository.

Conclusion

Although it looks like the setMaxRows is a portable solution to limit the size of the ResultSet, the SQL-level pagination is much more efficient if the database server optimizer doesn't use the JDBC maxRows property.
0 讨论(0)
发布评论:

提交评论
- 加载中...
小鲜肉

2021-02-08 20:18

not sure if i am right, but i remember in the past i was involved in big project to change all queries that were expected to return one row into 'TOP 1' or numrows=1. Reason was that the DB would stop searching for 'next possible matches' when this 'hint' was used. And in high volume environments this really made a difference. The remark that you can 'ignore' superfluous records in the client or in the resultset is not enough. You should avoid unnecessary reads as early as possible. But i have no idea whether the JDBC methods add those db specific hints to the query y/n. I may need to test however to see and use it ... i am not db specialist and can imagine i am not right, but "Speedwise it seems like no difference" can be a wrong assumption ... E.g. if you are asked to search in box for red balls and you only need one, it does not add value to keep searching for all where for you one is enough ... Then it matters to specify 'TOP 1' ...

0 讨论(0)
发布评论:

提交评论
- 加载中...
死守一世寂寞

2021-02-08 20:20

For most cases, you want to use the LIMIT clause, but at the end of the day both will achieve what you want. This answer is targeted at JDBC and PostgreSQL, but is applicable to other languages and databases that use a similar model.

The JDBC documentation for Statement.setMaxRows says

If the limit is exceeded, the excess rows are silently dropped.

i.e. The database server may return more rows but the client will just ignore them. The PostgreSQL JDBC driver limits on both the client and server side. For the client side, have a look at the usage of maxRows in the AbstractJdbc2ResultSet. For the server side, have a look of maxRows in QueryExecutorImpl.

Server side, the PostgreSQL LIMIT documentation says:

The query optimizer takes LIMIT into account when generating a query plan

So as long as the query is sensible, it will load only the data it needs to fulfill the query.

0 讨论(0)
发布评论:

提交评论
- 加载中...
天命终不由人

2021-02-08 20:34

setFetchSize Gives the JDBC driver a hint as to the number of rows that should be fetched from the database when more rows are needed for ResultSet objects generated by this Statement.

setMaxRows Sets the limit for the maximum number of rows that any ResultSet object generated by this Statement object can contain to the given number.

I guess using above 2 JDBC API you can try by using setFetchSize you can try if it works for 100K records. Else you can fetch in batch and form ArrayList and return it to your Jasper report.

0 讨论(0)
发布评论:

提交评论
- 加载中...
挽巷

2021-02-08 20:42

The advantage of setmaxrows is that you can create universal statements, valid in Postgres, Oracle, Mysql etc As Oracle is using rownum syntax, postgres - limit, msqsql - top

Speedwise it seems like no difference.

0 讨论(0)
发布评论:

提交评论
- 加载中...

SQL LIMIT vs. JDBC Statement setMaxRows. Which one is better?

SQL-level LIMIT

JDBC Statement maxRows

Conclusion