SQL Server and SqlDataReader - Trillion Records - Memory

后端 未结 3 1927
栀梦
栀梦 2021-01-05 17:14

I\'ve never tried this - so I don\'t know if I\'d run into memory issues.

But can a SqlDataReader read a trillion records? It\'s all streamed correct? I\'m a little

3条回答
  •  别那么骄傲
    2021-01-05 18:02

    There are a few details.

    • SqlDataReader will normally read an entire row in memory and cache it. This includes any BLOB fields, so you can end up caching several 2GB fields in memory (XML, VARBINARY(MAX), VARCHAR(MAX), NVARCHAR(MAX)). If such fields are a concern then you must pass in the CommandBehavior.SequentialAccess to ExecuteReader and use the streaming capabilities of the SqlClient specific types like SqlBytes.Stream.

    • A connection is busy until the SqlDataReader completes. This creates transactional problems because you won't be able to to any processing in the database in the same transaciton, because the connection is busy. Trying to open a different conneciton and enroll in the same transaction will fail, as loop back distributed transacitons are prohibited. The loution is to use MARS. You do so by setting MultipleActiveResultSets=True on the connection. This allows you to issue command on the same connection while a data reader is still active (typical fetch-process-fetch loop). Read the link to Christian Kleinerman's with great care, make sure you understand the issues and restrictions around MARS and transactions, they're quite subtle and counter intuitive.

    • Lengthy processing in the client will block the server. Your query will still be executing all this time and the server will have to suspend it when the communication pipe fills up. A query consumes a worker (or more if it has parallel plans) and workes are a very scarce commodity in a server (they equate roughly to threads). You won't be bale to afford many clients processing huge result sets at their own leissure.

    • Transaction size. Processing a trillion records on one transaction is never going to work. The log will have to grow to accomodate the entire transaction and won't truncate and reuse the VLFs, resulting in huge log growth.

    • Recovery time. If processing fails at the 999 billionth record it will have to rollback all the work done, so it will take another '12' days just to rollback.

提交回复
热议问题