Oracle Pagination strategy

后端 未结 2 655
盖世英雄少女心
盖世英雄少女心 2021-01-24 10:01

I want to fetch million of rows from a table between two timestamps and then do processing over it. Firing a single query and retrieving all the records at once looks to be a ba

2条回答
  •  一生所求
    2021-01-24 10:54

    Pagination pattern has been invented for the purpose of websites presentation (in opposite to scrolling navigation), and works best there. In short, the live user is practically unable to view thousands/millions of records at once, so the information is divided into short pages (50~200 records), where one query is usually sent to the database for each page. The user usually clicks on a few pages only, but does not browse all of them, in addition the user needs a bit of time to browse the page, so the queries are not sent to the database one by one, but in long intervals. The time to retrieve a chunk of data is much shorter than retrieving all millions of record, so the user is happy because he does not have to wait long for subsequent pages, and the overall system load is smaller.


    But it seems from the question that the nature of your application is oriented to batch processing rather than to the web presentation. The application must fetch all records and do some operations/transformations (calculations) on each of the records. In this case , completely different design patterns are used (stream/pipelined processing, sequence of steps, parallel steps/operations etc), and pagination will not work, if you go that way you will kill your system performance.


    Instead of fancy theory, let's look at simple and practical example which will show you what differences in speed we are talking here

    Let say there is a table PAGINATION with about 7 millions of records:

    create table pagination as
    select sysdate - 200 * dbms_random.value As my_date, t.*
    from (
        select o.* from all_objects o 
        cross join (select * from dual connect by level <= 100)
        fetch first 10000000 rows only
    ) t;
    
    select count(*) from pagination;
    
      COUNT(*)
    ----------
       7369600
    

    Let say there is an index created on MY_DATE column, and index statistics are fresh:

    create index PAGINATION_IX on pagination( my_date );
    
    BEGIN dbms_stats.gather_table_stats( 'TEST', 'PAGINATION', method_opt => 'FOR ALL COLUMNS' ); END;
    /
    

    Let say that we are going to process about 10% of records from the table between the below dates:

    select count(*) from pagination
    where my_date between date '2017-10-01' and '2017-10-21';
    
      COUNT(*)
    ----------
        736341
    

    and finally let say that our "processing" for simplicity, will consist in simple summing of lengths of one of field.
    This is a simple paging implementation:

    public class Pagination {
    
        public static class RecordPojo {
            Date myDate;
            String objectName;
    
            public Date getMyDate() {
                return myDate;
            }
            public RecordPojo setMyDate(Date myDate) {
                this.myDate = myDate;
                return this;
            }
            public String getObjectName() {
                return objectName;
            }
            public RecordPojo setObjectName(String objectName) {
                this.objectName = objectName;
                return this;
            }
        };
    
        static class MyPaginator{
    
            private Connection conn;
            private int pageSize;
            private int currentPage = 0;
    
            public MyPaginator( Connection conn, int pageSize ) {
                this.conn = conn;
                this.pageSize = pageSize;
            }
    
            static final String QUERY = ""
                    + "SELECT my_date, object_name FROM pagination "
                    + "WHERE my_date between date '2017-10-01' and '2017-10-21' "
                    + "ORDER BY my_date "
                    + "OFFSET ? ROWS FETCH NEXT ? ROWS ONLY";
    
            List getNextPage() {
                List list = new ArrayList<>();
                ResultSet rs = null;
                try( PreparedStatement ps = conn.prepareStatement(QUERY);) {
                    ps.setInt(1, pageSize * currentPage++ );
                    ps.setInt(2,  pageSize);
                    rs = ps.executeQuery();
    
                    while( rs.next()) {
                        list.add( new RecordPojo().setMyDate(rs.getDate(1)).setObjectName(rs.getString(2)));
                    }
    
                } catch (SQLException e) {
                    e.printStackTrace();
                }finally {
                    try{rs.close();}catch(Exception e) {}
                }
                return list;
            }
    
            public int getCurrentPage() {
                return currentPage;
            }
        }
    
    
        public static void main(String ...x) throws SQLException {
            OracleDataSource ds = new OracleDataSource();
            ds.setURL("jdbc:oracle:thin:test/test@//localhost:1521/orcl");
            long startTime = System.currentTimeMillis();
            long value = 0;
            int pageSize = 1000;
    
            try( Connection conn = ds.getConnection();){
                MyPaginator p = new MyPaginator(conn, pageSize);
                List list;
                while( ( list = p.getNextPage()).size() > 0 ) {
                    value += list.stream().map( y -> y.getObjectName().length()).mapToLong(Integer::longValue).sum();
                    System.out.println("Page: " + p.getCurrentPage());
                }
                System.out.format("==================\nValue = %d, Pages = %d,  time = %d seconds", value, p.getCurrentPage(), (System.currentTimeMillis() - startTime)/1000);
            }
        }
    }
    

    A result is:

    Value = 18312338, Pages = 738,  time = 2216 seconds
    

    Now let's test a very simple stream based solution - just take only one record, process it, discard it (freeing up memory), and take the next one.

    public class NoPagination {
    
        static final String QUERY = ""
                + "SELECT my_date, object_name FROM pagination "
                + "WHERE my_date between date '2017-10-01' and '2017-10-21' "
                + "ORDER BY my_date ";
    
        public static void main(String[] args) throws SQLException {
            OracleDataSource ds = new OracleDataSource();
            ds.setURL("jdbc:oracle:thin:test/test@//localhost:1521/orcl");
            long startTime = System.currentTimeMillis();
            long count = 0;
    
            ResultSet rs = null;
            PreparedStatement ps = null;
            try( Connection conn = ds.getConnection();){
                ps = conn.prepareStatement(QUERY);
                rs = ps.executeQuery();
                while( rs.next()) {
                    // processing
                    RecordPojo r = new RecordPojo().setMyDate(rs.getDate(1)).setObjectName(rs.getString(2)); 
                    count+=r.getObjectName().length();
                }
                System.out.format("==================\nValue = %d, time = %d seconds", count, (System.currentTimeMillis() - startTime)/1000);
            }finally {
                try { rs.close();}catch(Exception e) {}
                try { ps.close();}catch(Exception e) {}
            }
        }
    

    A result is:

    Value = 18312328, time = 11 seconds
    

    Yes - 2216 seconds / 11 seconds = 201 times faster - 20 100 % faster !!!
    Unbelievable ? You can test it yourself.
    This example shows how important it is to choose the right solution (right design patterns) to solve the problem.

提交回复
热议问题