SQL Server XML Column Performance

问题

Converting nText columns which contained XML to the XML data type has resulted in worse performance in SQL Server.

I am currently working on a project where nText columns have been used to store valid XML. I have successfully migrated these columns to the XML data type. However according to SQL Profiler the performance of the XML data type is worse than using nText or nvarchar(max) to store the XML. Everything I have read implies that this should not be the case.

In order to verify this I created two tables with the same indexes etc

Table Name Order1
[id] [int] IDENTITY(1,1) NOT NULL,
[uid] [varchar](36) NOT NULL,
[AffiliateId] [varchar](36) NOT NULL,
[Address] [ntext] NOT NULL,
[CustomProperties] [ntext] NOT NULL,
[OrderNumber] [nvarchar](50) NOT NULL,
...

Table Name Order2
[id] [int] IDENTITY(1,1) NOT NULL,
[uid] [varchar](36) NOT NULL,
[AffiliateId] [varchar](36) NOT NULL,
[Address] [xml] NOT NULL,
[CustomProperties] [xml] NOT NULL,
[OrderNumber] [nvarchar](50) NOT NULL,
...

I have then copied the data using a select/insert statement and rebuilt the indexes on both the tables. I then created a script with the following SQL.

DBCC DROPCLEANBUFFERS
GO
--Part1
Select id, uid, AffiliateId, Address, CustomProperties, OrderNumber from [dbo].[Order1] where uid = 'F96045F8-A2BD-4C02-BECB-6EF22C9E473F'
Select id, uid, AffiliateId, Address, CustomProperties, OrderNumber from [dbo].[Order1] where uid = 'A3B71348-EB68-4600-9550-EC2CF75698F4'
Select id, uid, AffiliateId, Address, CustomProperties, OrderNumber from [dbo].[Order1] where uid = 'CB114D91-F000-4553-8AFE-FC20CF6AD8C0'
Select id, uid, AffiliateId, Address, CustomProperties, OrderNumber from [dbo].[Order1] where uid = '06274E4F-E233-4594-B505-D4BAA3770F0A'

DBCC DROPCLEANBUFFERS
GO
--Part2
Select id, uid, AffiliateId, Address, OrderNumber,  
CAST(CustomProperties AS xml).query('CustomProperty/Key[text()="AgreedToTerms"]/../Value/text()')  as "TermsAgreed" 
from Order1

DBCC DROPCLEANBUFFERS
GO
--Part3
Insert Into Order1 uid, AffiliateId, Address, CustomProperties, OrderNumber
Select NewId(), AffiliateId, Address, CustomProperties, OrderNumber + 'X' from [dbo].[Order1] where uid = 'F96045F8-A2BD-4C02-BECB-6EF22C9E473F'

Insert Into Order1 uid, AffiliateId, Address, CustomProperties, OrderNumber
Select NewId(), AffiliateId, Address, CustomProperties, OrderNumber + 'X' from [dbo].[Order1] where uid = 'A3B71348-EB68-4600-9550-EC2CF75698F4'

Insert Into Order1 uid, AffiliateId, Address, CustomProperties, OrderNumber
Select NewId(), AffiliateId, Address, CustomProperties, OrderNumber + 'X' from [dbo].[Order1] where  uid = 'CB114D91-F000-4553-8AFE-FC20CF6AD8C0'

Insert Into Order1 uid, AffiliateId, Address, CustomProperties, OrderNumber
Select NewId(), AffiliateId, Address, CustomProperties, OrderNumber + 'X' from [dbo].[Order1] where uid = '06274E4F-E233-4594-B505-D4BAA3770F0A'

DBCC DROPCLEANBUFFERS
GO
-- Part4 This updates a .5M row table.
Update [dbo].[Order1] Set CustomProperties = Cast(CustomProperties as NVARCHAR(MAX)) + CAST('' as NVARCHAR(MAX)), Address = Cast(CustomProperties as NVARCHAR(MAX)) + CAST('' as NVARCHAR(MAX))

The results average results from the SQL Profiler are as follows:-

NTEXT

+-------+-------------+-------------+-------------+-------------+
| Test  |     CPU     |    Reads    |   Writes    |  Duration   |
+-------+-------------+-------------+-------------+-------------+
| Part1 | 281.3333333 | 129.3333333 |           0 |         933 |
| Part2 | 78421.66667 |     5374306 | 10.66666667 | 47493.66667 |
| Part3 | 281.6666667 |         616 | 27.66666667 | 374.6666667 |
| Part4 | 40312.33333 | 15311252.67 |      320662 |       67010 |
| Total |             |             |             | 115811.3333 |
+-------+-------------+-------------+-------------+-------------+


XML

+-------+-------------+-------------+-------------+-------------+
| Test  |     CPU     |    Reads    |   Writes    |  Duration   |
+-------+-------------+-------------+-------------+-------------+
| Part1 |         282 | 58.33333333 |           0 | 949.3333333 |
| Part2 | 21129.66667 | 180143.3333 |           0 | 76048.66667 |
| Part3 |         297 | 370.3333333 | 14.66666667 |         378 |
| Part4 | 112578.3333 | 8908940.667 | 145703.6667 | 114684.3333 |
| Total |             |             |             | 192060.3333 |
+-------+-------------+-------------+-------------+-------------+

Is the test script flawed? Or is there some other optimisation that needs to be carried out for xml data type columns out side of https://docs.microsoft.com/en-us/previous-versions/sql/sql-server-2005/administrator/ms345115(v=sql.90)

I would expect the XML column type to outperform ntext.

回答1:

So this might not be an answer, at least not a solution, but it will hopefully help to understand what's going on...

The most expensive part with XML is the initial parsing, put in other words: The transformation between the textual representation and the technical storage.

Important to know: Native XML is not stored as the text you see, but as hierachy table. This needs very heavy proecessing when you pass some textual XML into SQL-Server. Calling this XML for a human reader needs the opposite process. Storing this string in a string column (be aware that NTEXT is deprecated for centuries) is faster, than storing it as native XML, but you will lose many advantages.

So to your script:

I assume, that you ran the same script but just changed Order1 to Order2. Is this correct?

Part 1 measures a simple `SELECT`.

In order to offer a readable representation, SQL-Server (or rather SSMS) will transform any value to some kind of text. If your tables include INTs, GUIDs or a DateTime, you would not see the actual bit patter, would you? SSMS will use quite expensive actions to create something readable for you. The expensive part is the transformation. Strings do not need this, so NTEXT will be faster.

Part 2 measures the `.query()` method (also in terms of "how to present the result").

Did you use the CAST( AS XML) with Order2 too? However, with such a need XML should be faster, because NTEXT will have to do the heavy parsing over and over, while XML is stored in a queryable format already... But your XQuery is rather sub-optimal (due to the backward navigation ../Value). Try this:

 .query('/CustomProperty[Key[text()="AgreedToTerms"]]/Value/text()')

This will look for a <CustomProperty> where there is a <Key> with the given content and will read the <Value> below <CustomProperty> without the need of ../

I'd surely expect XML to outperform NTEXT with a CAST here... The very first call to completely new tables and indexes might return biased results...

Part 3 measures inserts

Here I would expect rather the same performance... If you move a string value into another string column this is simple copying. Moving native XML into another XML column is simple copying too.

Part 4 measures updates

This looks rather weird... What are you trying to achieve? The code needs to tranform your native XMLs to strings and re-parse them to be stored in XML. Doing the same with NTEXT will not need these expensive actions at all...

Some general thougths

If you get some XML from outside, read it from a file and you need to query it just once, string methods on string types can be faster, but: If you want to store XML permanently in order to use and manipulate their values more often, the native XML type will be much better.
In many cases performance measures do not measure what you think you do...
Try to create your tests in a way, that the presentation of the results is not part of the test (e.g. do an INSERT against a temp table, stop the clock and push the output from the temp table)

UPDATE Another test for "Part 2"

Try this test script:

USE master;
GO
CREATE DATABASE testShnugo;
GO
USE testShnugo;
GO
CREATE TABLE dbo.WithString(ID INT,SomeXML NTEXT);
CREATE TABLE dbo.WithXML(ID INT,SomeXML XML);
GO
--insert 100.000 rows to both tables
WITH Tally(Nmbr) AS (SELECT TOP 100000 ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM master..spt_values v1 CROSS JOIN master..spt_values v2)
INSERT INTO dbo.WithXML(ID,SomeXML) 
SELECT Nmbr,(SELECT Nmbr AS [@nmbr],CONCAT('hallo',Nmbr) AS [SomeTest/@FindMe],CONCAT('SomeTestValue',Nmbr) As [SomeTest] FOR XML PATH('row'),ROOT('root'),TYPE)
FROM Tally
--copy everything to the second table
INSERT INTO dbo.WithString(ID,SomeXML) SELECT ID,CAST(SomeXML AS NVARCHAR(MAX)) FROM dbo.WithXML; 
GO
--check the actual content
SELECT * FROM dbo.WithString;
SELECT * FROM dbo.WithXML;
GO

DECLARE @d DATETIME2=SYSUTCDATETIME();
SELECT * FROM dbo.WithString WHERE SomeXML LIKE '%FindMe="hallo333"%'
PRINT 'String-Method LIKE ' 
PRINT DATEDIFF(millisecond,@d,SYSUTCDATETIME());

SET @d=SYSUTCDATETIME();
SELECT * FROM dbo.WithString WHERE CAST(SomeXML AS xml).exist('/root/row[SomeTest[@FindMe="hallo333"]]')=1
PRINT 'CAST NTEXT to XML and .exist()' 
PRINT DATEDIFF(millisecond,@d,SYSUTCDATETIME());

SET @d=SYSUTCDATETIME();
SELECT * FROM dbo.WithXML WHERE CAST(SomeXML AS nvarchar(MAX)) LIKE '%FindMe="hallo333"%'
PRINT 'String-Method LIKE after CAST XML to NVARCHAR(MAX)' 
PRINT DATEDIFF(millisecond,@d,SYSUTCDATETIME());

SET @d=SYSUTCDATETIME();
SELECT * FROM dbo.WithXML WHERE SomeXML.exist('/root/row[SomeTest[@FindMe="hallo333"]]')=1
PRINT 'native XML with .exist()' 
PRINT DATEDIFF(millisecond,@d,SYSUTCDATETIME());

GO
USE master;
GO
DROP DATABASE testShnugo;

First I create tables and fill them with 100.000 XMLs like this

<root>
  <row nmbr="1">
    <SomeTest FindMe="hallo1">SomeTestValue1</SomeTest>
  </row>
</root>

My results

String-Method LIKE 
836

CAST NTEXT to XML and .exist()
1962

String-Method LIKE after CAST XML to NVARCHAR(MAX)
1079

native XML with .exist()
911

As expected the fastest approach is a string method against a string type on very tiny strings. But - of course - this will not be as mighty as an elaborated XQuery and will not be able to deal with namspaces, multiple occurances and so on.

The slowest is the cast of NTEXT to XML with .exist()

A string method against the native XML after a cast to string is not that bad actually, but this depends on the XML's size. This one was very tiny...

And 100.000 non-trivial XQuery calls against 100.000 different XMLs is almost as fast as the pure string approach.

UPDATE 2: larger XMLs

I repeated the test with larger XMLs just by changing the code above in one line

    SELECT Nmbr,(SELECT TOP 100 Nmbr AS [@nmbr],CONCAT('hallo',x.Nmbr) AS [SomeTest/@FindMe],CONCAT('SomeTestValue',x.Nmbr) As [SomeTest] FROM Tally x FOR XML PATH('row'),ROOT('root'),TYPE)

Now each and any XML will consist of 100 <row> elements.

<root>
  <row nmbr="1">
    <SomeTest FindMe="hallo1">SomeTestValue1</SomeTest>
  </row>
  <row nmbr="2">
    <SomeTest FindMe="hallo2">SomeTestValue2</SomeTest>
  </row>
  <row nmbr="3">
    <SomeTest FindMe="hallo3">SomeTestValue3</SomeTest>
  </row>
  ...more of them

With a search for FindMe="hallo333" this won't return anything, but the time to find, that there is nothing to return, is enough for us:

String-Method LIKE 
71959

CAST NTEXT to XML and .exist()
74773

String-Method LIKE after CAST XML to NVARCHAR(MAX)
104380

native XML with .exist()
16374

The fastest - by far! - is now the native XML. The string approaches get lost due to the strings sizes.

Please let me know your result too.

来源：https://stackoverflow.com/questions/56485445/sql-server-xml-column-performance

标签

sql-server