Importing and parsing a large XML file in SQL Server (when “normal” methods are rather slow)

无人久伴 提交于 2019-12-13 05:16:22

问题


I have a large XML file that I need to import and parse into tabular structure ("flatten") in SQL Server. By "large" I mean a file that is around 450 MB, contains up to 6-7 nested levels and lots of elements, ~300.

I tried parsing the file both with OPENXML and Xml.Nodes. Both of the methods are slow. A partial query which reads a parent element and it's nested grandchildren takes several minutes if not dozens to run.

I tried using the SQLXML Bulk Load method. Unfortunately I couldn't - because the file isn't structured properly. There is an element which is logically a parent element which isn't nested as a parent physically.

Do you think the only posiblle solution left is to use .NET or Java? Is there something I'm missing?

I would strongly prefer a dynamic solution, to some degree. I don't want the SQL Server developers to relay on a procedural, compiled, code that they have no control/knowledge about - in the event that some changes will occur (in the XML structure).

Thank you very much.


回答1:


OK. I created an XML Index on the XML data column. (Just a primary one for now).

A query that took ~4:30 minutes before takes ~9 seconds now! Seems that a table that stores the XML with a proper XML Index and the parsing the data with the xml.nodes() function are a feasible solution.

Thank you all.




回答2:


Since you want a tabular structure, you could convert the XML to a CSV file (using this java or this .NET tool , or even an XSLT tranformation) and then perform a bulk insert.

Of course, all that depends on your XML being properly formed.




回答3:


Well, first of all I don't really understand why you would use OpenXml to load the file. I am pretty sure that doing that will internally trigger a whole bunch of tests for validity according to OpenXml ISO Standard.

But - Xml.Nodes() (I assume that means the DOM way of loading data) - is by far the slowest way to load and parse Xml data. Consider instead a SAX approach using XmlReader or similar. I do realize that the article is from 2004 - but it still explains the stuff pretty well.



来源:https://stackoverflow.com/questions/23888494/importing-and-parsing-a-large-xml-file-in-sql-server-when-normal-methods-are

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!