Query speed based on order of columns

前端 未结 4 1681
半阙折子戏
半阙折子戏 2021-02-04 14:57

Does the order of the column types in your database have any affect on the query time?

For example, would a table with mixed ordering (INT, TEXT, VARCHAR, INT, TEXT) be

相关标签:
4条回答
  • 2021-02-04 15:38

    I would suggest that there is absolutely no [significant] difference no matter how you order the columns.

    PostgreSQL: http://social.msdn.microsoft.com/Forums/en-US/sqldatabaseengine/thread/a7ce8a90-22fc-456d-9f56-4956c42a78b0

    SQL Server: http://social.msdn.microsoft.com/Forums/en/sqldatabaseengine/thread/36713a82-315d-45ef-b74e-5f342e0f22fa

    I suspect the same for MySQL.

    All data is read in pages, so if your data fits into a single page it does not matter how you order the columns. If a disk block size is 2K, 4K, it will take in multiple to satisfy the "8K page request". If the disk block size is 64K (for large DB systems), you would already be buffering other data.

    Not only that, if a record is requested, it will normally retrieve all pages for the record, including the overflow to pages 2 and 3 if the data spans multiple pages. The columns are then worked out from the data retrieved. SQL Server has a limit on in-page data, which is about 8060 bytes. Anything larger is stored off the main data page, similar to TOAST for PostgreSQL and is not retrieved if the column is not used. It still does not matter where the column is in the order.

    In SQL Server for example, multiple bit fields are stored together in a bit patterned mask - this is irrespective of whether you put the columns next to each other. I would suspect MySQL and PostgreSQL to do much the same to optimize space.

    Note: [significant] - the only reason for this qualification is that, possibly, when extracting a particular column from a data page, having it in the beginning helps because the low-level assembly calls do not have to seek far in the memory block.

    0 讨论(0)
  • 2021-02-04 15:48

    In PostgreSQL, you will get an advantage if you put fixed-width columns first because that access path is specially optimized. So (INT, INT, VARCHAR, TEXT, TEXT) will be fastest (the relative order of VARCHAR and TEXT doesn't matter).

    Additionally, you can save space, which can translate to more throughput and performance, if you manage the alignment requirements of the types correctly. For example, (INT, BOOL, INT, BOOL) will require 13 bytes of space because the third column has to be aligned at a 4-byte boundary, and so there will be 3 bytes of space wasted between the second and the third column. Better here would be (INT, INT, BOOL, BOOL). (Whatever comes after this row will probably also require alignment of at least 4 bytes, so you will waste 2 bytes at the end.)

    0 讨论(0)
  • 2021-02-04 15:49

    The answer is yes, it does matter, and it can matter a great deal, but usually not much.

    All I/O is done at a page level (typically 2K or 4K depending on your OS). Column data for rows are stored next to each other, except when the page becomes full, in which case the data is written on the another (usually the next) page.

    The greater the on-disk data space required for columns between (based on the the table definition) the columns you select, the greater the chance that the data for the selected columns will (sometimes) be on different pages. Being on a different page may result in an extra I/O operation (if there are no other rows being selected on the other page). In the worst case, each column you select could be on a different page.

    Here's an example:

    create table bad_layout (
    num1 int,
    large1 varchar(4000),
    num2 int,
    large2 varchar(4000),
    num3 int,
    large3 varchar(4000)
    );
    
    create table better_layout (
    num1 int,
    num2 int,
    num3 int,
    large1 varchar(4000),
    large2 varchar(4000),
    large3 varchar(4000)
    );
    

    Comparing: select num1, num2, num3 from bad_layout; select num1, num2, num3 from better_layout;

    Because for bad_layout each num column is basically going to be on a different page, each row will require 3 i/O operations. Conversely, for better_layout num columns are usually going to be on the same page.

    The bad_layout query is likely to take about 3 times longer to execute.

    Good table layout can make a large difference to query performance. You should try to keep columns that are usually selected together as close as possible to each other in the table layout.

    0 讨论(0)
  • 2021-02-04 15:50

    The order is unlikely to matter much. The running time is dominated by things like disk access times, and the number and order of disk accesses is unlikely to change as a result of reordering the data within a row.

    The one exception is if you have a very big item in your row (much bigger than a disk block, usually 4K?). If you have one very big column in a table, you might want to put it as the last column so that if you aren't accessing it, it might not need to be fully paged in. But even then, you'd have to work pretty hard to generate a data set and access pattern where the difference would be noticeable.

    0 讨论(0)
提交回复
热议问题