TABLESAMPLE returns wrong number of rows?

后端 未结 4 1484
无人共我
无人共我 2020-12-17 20:18

I\'ve just discovered the TABLESAMPLE clause but surprisingly it doesn\'t return the number of rows i\'ve specified.

The table that i\'ve used has ~14M rows and i wa

相关标签:
4条回答
  • 2020-12-17 20:58

    From the documentation.

    The actual number of rows that are returned can vary significantly. If you specify a small number, such as 5, you might not receive results in the sample.

    http://msdn.microsoft.com/en-us/library/ms189108(v=sql.90).aspx

    0 讨论(0)
  • 2020-12-17 21:09

    See the article here. You need to add a top clause and/or use the repeatable option to get the number of rows you want.

    0 讨论(0)
  • 2020-12-17 21:11

    This behavior has been documented before. There is a good writeup on it here.

    I believe you can fix it by passing REPEATABLE with the same seed each time. Here is a snippit from the writeup:

    ...you will notice that different numbers of rows are returned everytime. Without any data changing, re-running the identical query keeps giving different results. This is non -deterministic factor of TABLESAMEPLE clause. If table is static and rows are not changed what could be the reason to return different numbers of the rows to return in each execution. The factor is 10 PERCENT is not the percentages of the table rows or tables records, it is percentages of the table’s data pages. Once the sample pages of data selected, all the rows from the selected pages are returned, it will not limit the number of rows sampled from that page. Fill factor of all the pages varies depends on the data of the table. This makes script to return different row count in result set everytime it is executed. The REPEATABLE option causes a selected sample to be returned again. When REPEATABLE is specified with the same repeat_seed value, SQL Server returns the same subset of rows, as long as no changes have been made to the table. When REPEATABLE is specified with a different repeat_seed value, SQL Server will typically return a different sample of the rows in the table. .

    0 讨论(0)
  • 2020-12-17 21:11

    I've observed the same.

    The page explanation definitely makes sense and rings a bell - You should see much more predictable row counts when your row size is fixed. Try it on a table with no nullable or variable-length columns.

    In fact I just used it to prove a theory about using it to update (you were probably spurred by the same question I was), and choosing TABLESAMPLE (50000 ROWS) actually affected 49,849 rows.

    0 讨论(0)
提交回复
热议问题