Handling large SQL select queries / Read sql data in chunks

后端 未结 3 1809
抹茶落季
抹茶落季 2020-12-30 10:20

I\'m using .Net 4.0 and SQL server 2008 R2.

I\'m running a big SQL select query which returns millions of results and takes up a long time to fully run.

Does

相关标签:
3条回答
  • 2020-12-30 11:01

    It depends in part on whether the query itself is streaming, or whether it does lots of work in temporary tables then (finally) starts returning data. You can't do much in the second scenario except re-write the query; however, in the first case an iterator block would usually help, i.e.

    public IEnumerable<Foo> GetData() {
         // not shown; building command etc
         using(var reader = cmd.ExecuteReader()) {
             while(reader.Read()) {
                 Foo foo = // not shown; materialize Foo from reader
                 yield return foo;
             }
         }
    }
    

    This is now a streaming iterator - you can foreach over it and it will retrieve records live from the incoming TDS data without buffering all the data first.

    If you (perhaps wisely) don't want to write your own materialization code, there are tools that will do this for you - for example, LINQ-to-SQL's ExecuteQuery<T>(tsql, args) will do the above pain-free.

    0 讨论(0)
  • 2020-12-30 11:04

    You'd need to use data paging.

    SQL Server has the TOP clause (SQL TOP 10 a,b,c from d) and BETWEEN:

    SELECT TOP 10000 a,b,c from d BETWEEN X and Y
    

    Having this, I guess you'd be able of retrieving an N number of rows, do some partial processing, then load next N number of rows and so on.

    This can be achieved by implementing a multithreaded solution: one will be retrieving results while the other will asynchronously wait for data and it'll be doing some processing.

    0 讨论(0)
  • 2020-12-30 11:09

    if you really have to process millions of records Why dont you load 10,000 each round process them and then load the next 10,000? if not consider using the DBMS to filter the data before loading it as the performance on the database is much better than in you logic leyer.

    Or follow a lazy load concept and load only Ids to which you load the actual data only when you need it.

    0 讨论(0)
提交回复
热议问题