Simple Performance Test on Azure SQL Data Warehouse

前端 未结 1 543
甜味超标
甜味超标 2021-01-03 10:10

We are working to port existing applications to Azure SQL Data Warehouse. In order to better understand the performance/workload management characteristics/capabilities of

1条回答
  •  一生所求
    2021-01-03 10:32

    At DWU 1000 you get, 32 max concurrent queries and 40 concurrency slots, so some of your queries are going to have to queue.

    What indexing and distribution choices have you made? This table is small so it sounds like a better candidate for a clustered index and not clustered columnstore (the default). Also make sure you have created your statistics.

    Where are you calling sqlcmd from, eg an Azure VM so it's "closer" to the DW, or from your laptop in which case you may be waiting for network round trips.

    Review the concurrency DMV:sys.dm_pdw_exec_requests Review the waits DMVs:sys.dm_pdw_waits

    This recent answer looks useful too.

    I have done an annotation of your sample EXPLAIN plan. turn on line numbers in SSMS or view in something like Sublime text for best effect:

    • Line 3 is the query being analyzed.
    • Line 4 lists the total number of operations or steps in the plan as 8. Each operation is held in a dsql_operation element within the XML.
    • Line 5 starts operation 1, RND_ID, or RandomIdOperation. This operation simply creates a unique name for temporary objects used in the query plan. The identifier is TEMP_ID_21523.
    • Line 8 starts operation 2, ON, or OnOperation. This performs an action on a database or object. This particular step creates a temp table [TEMP_ID_21523] on all nodes as specified in line 9. The DDL to create the temp table on all nodes is on line 11. This temp table only has one column, called 'col' of datatype DATE.
    • Line 14 is operation 3, a Data Movement Service (DMS) Operation called SHUFFLE_MOVE, or ShuffleMoveOperation. SHUFFLE_MOVE redistributes a distributed table.
    • Line 16 gives the statement used in the SHUFFLE_MOVE. It's moving data from a calculated column from table [AR_CORE_DIM_TABLES].[calendar_dim] into the temp table [TEMP_ID_21523], which we know exists on all nodes.
    • Line 22 starts the next operation 4, another ON, or OnOperation. This operation creates another temp table on the control node, with one BIGINT column. The DDL for this table is provided on line 25.
    • Line 28 starts operation 5, a PARTITION_MOVE, or PartitionMoveOperation. This DMS operation moves data from a distributed table to a single table on the Control node. This operation is used for aggregation operations on the Control node. This particular step moves data from temp table [TEMP_ID_21523] which exists on all nodes to destination temp table [QTable_3ff2...] which is on the control node.
    • Lines 31 to 49 list the SQL used to do this.
    • Line 53 starts operation 6, another ON, or OnOperation. This step drops the temp table [TEMP_ID_21523] which exists on all nodes.
    • Line 59 starts operation 7 of 8, a RETURN or ReturnOperation. This operation which occurs on the control node, sends query results from the control node to the user who submitted the query. The SQL returned is shown in lines 61-67.
    • Line 69 starts the last operation 8 of 8, another ON, or OnOperation. This particular step drops the temp table [QTable_3ff2...] which exists on the control node.

    For your query, the PARTITION_MOVE or SHUFFLE_MOVE step are the most likely causes for performance issues and improving performance would involve removing or improving them.

    To go any further I would need to know the DDL for the table [AR_CORE_DIM_TABLES].[calendar_dim] and the view [AR_WM_VM].[CALENDAR_DAY GROUP] so I can work out the distribution and if any calculated columns are being used.

    This annotation is based on a similar one in the APS help file sections on EXPLAIN plans and Understanding Query Plans where some of the text is copied from. I have adapted it for your plan.

    0 讨论(0)
提交回复
热议问题