What is the fastest way to load an XML file into MySQL using C#?

后端 未结 8 1027
独厮守ぢ
独厮守ぢ 2020-12-15 09:19

Question

What is the fastest way to dump a large (> 1GB) XML file into a MySQL database?

Data

The data in question is the StackOverflow Creative

相关标签:
8条回答
  • 2020-12-15 09:37

    Ok, I'm going to be an idiot here and answer your question with a question.

    Why put it in a database?

    What if ... just a what-if... you wrote the xml to files on local drive and, if needed, write some indexing information in the database. This should perform significantly faster than trying to load a database and would much more portable. All you would need on top of it is a way to search and a way to index relational references. There should be plenty of help with searching, and the relational aspect should be easy enough to build? You might even consider re-writing the information so that each file contains a single post with all the answers and comments right there.

    Anyway, just my two-cents (and that is not worth a dime).

    0 讨论(0)
  • 2020-12-15 09:38

    In PostgreSQL, the absolute fastest way to get bulk data in is to drop all indexes and triggers, use the equivalent of MySQL's LOAD DATA and then recreate your indexes/triggers. I use this technique to pull 5 GB of forum data into a PostgreSQL database in roughly 10 minutes.

    Granted, this may not apply to MySQL, but it's worth a shot. Also, this SO question's answer suggests that this is in fact a viable strategy for MySQL.

    A quick google turned up some tips on increasing the performance of MySQL's LOAD DATA.

    0 讨论(0)
  • 2020-12-15 09:41

    I have a few thoughts to help speed this up...

    1. The size of the query may need to be tweaked, there's often a point where the big statement costs more in parsing time and so becomes slower. The 500 may be optimal, but perhaps it is not and you could tweak that a little (it could be more, it could be less).

    2. Go multithreaded. Assuming your system isn't already flatlined on the processing, you could make some gains by having breaking up the data in to chunks and having threads process them. Again, it's an experimentation thing to find the optimal number of threads, but a lot of people are using multicore machines and have CPU cycles to spare.

    3. On the database front, make sure that the table is as bare as it can be. Turn off any indexes and load the data before indexing it.

    0 讨论(0)
  • 2020-12-15 09:41

    Does this help at all? It's a stored procedure that loads an entire XML file into a column, then parses it using XPath and creates a table / inserts the data from there. Seems kind of crazy, but it might work.

    0 讨论(0)
  • 2020-12-15 09:46

    SqlBulkCopy ROCKS. I used it to turn a 30 min function to 4 seconds. However this is applicable only to MS SQL Server.

    Might I suggest you look at the constraints on your table you've created? If you drop all keys on the database, constraints etc, the database will do less work on your insertions and less recursive work.

    Secondly setup the tables with big initial sizes to prevent your resizes if you are inserting into a blank database.

    Finally see if there is a bulk copy style API for MySQL. SQL Server basically formats the data as it would go down to disk and the SQL server links the stream up to the disk and you pump in data. It then performs one consistency check for all the data instead of one per insert, dramatically improving your performance. Good luck ;)

    Do you need MySQL? SQL Server makes your life easier if you are using Visual Studio and your database is low performance/size.

    0 讨论(0)
  • 2020-12-15 09:52

    Not the answer you want, but the mysql c api has the mysql_stmt_send_long_data function.

    0 讨论(0)
提交回复
热议问题