How to keep row order with SqlBulkCopy?

 ̄綄美尐妖づ 提交于 2020-01-02 08:25:34

问题


I'm exporting data programatically from Excel to SQL Server 2005 using SqlBulkCopy. It works great, the only problem I have is that it doesn't preserve the row sequence i have in Excel file. I don't have a column to order by, I just want the records to be inserted in the same order they appear in the Excel Spreadsheet.

I can't modify the Excel file, and have to work with what I've got. Sorting by any of the existing columns will break the sequence.

Please help.

P.S. Ended up inserting ID column to the spreadsheet, looks like there's no way to keep the order during export/import


回答1:


I don't think that row ordering is specified or guaranteed by SQL unless you use an "ORDER BY " clause.

From a post by Bill Vaughn (http://betav.com/blog/billva/2008/08/sql_server_indexing_tips_and_t.html):

Using Order By: Even when a table has a clustered index (which stores the data in physical order), SQL Server does not guarantee that rows will be returned in that (or any particular) order unless an ORDER BY clause is used.

Another link with info:

http://sqlblogcasts.com/blogs/simons/archive/2007/08/21/What-is-the-position-of-a-row--.aspx




回答2:


After lots of research it seems evident that there's no way to retain row order with the Bulk Insert command written as it is featured by Microsoft. You either have to add an ID column yourself directly into the import file, use a shell or other external script, or you do without. It seems it would be a needed (and easy) feature for Microsoft to add, but after more than a decade of nothing from them, it's not going to happen.

Yet I needed to preserve the actual record order in the import file after importing as higher up records would supersede those lower if a set column had the same value.

So I went a different route. My constraints were:

  • I couldn't change the source file at all. (and set a bad precedent!)
  • I couldn't use an external script. Too complicated. It had to be a simple T-Sql based solution, no CMD executions. This needed to go into a single procedure so it could be automated.

I liked the logic of using Powershell to create ordered insert statements for each row, then running in Sql. It was essentially queuing each record up for individual insert rather than BULK insert. Yes, it would work, but it would also be very slow. I often have files with 500K+ rows in them. I needed something FAST.

So I ran across XML. Bulk upload the file directly into a single XML variable. This would retain the order of the records as each is added to the XML. Then parse the XML variable and insert the results into a table, adding an identity column at the same time.

There is an assumption that the import file is a standard text file, with each record ending in a Line Feed (Char(13)+Char(10))

My approach has 2 steps:

  1. Execute the IMPORT SQL statement (using OPENROWSET), encapsulating each record with XML tags. Capture the results into an XML variable.

  2. Parse the variable by the XML tags into a table, adding an incrementing [ID] column.

    ---------------------------------
    Declare @X xml;
    ---------------------------------
    SELECT @X=Cast('<X>'+Replace([BulkColumn],Char(13)+Char(10),'</X><X>')+'</X>' as XML)
    FROM OPENROWSET (BULK N'\\FileServer\ImportFolder\ImportFile_20170120.csv',SINGLE_CLOB) T
    ---------------------------------
    SELECT [Record].[X].query('.').value('.','varchar(max)') [Record]
    ,ROW_NUMBER() OVER (ORDER BY (SELECT 100)) [ID]
    --Into #TEMP 
    FROM @X.nodes('X') [Record](X);
    ---------------------------------
    
    • The XML tags replace each Line Feed.

    • If the file ends with a Line Feed, this will cause a blank row to be added at the end. Simply delete the last row.

I wrote this into my procedure using dynamic sql so I could pass in the FileName and set the ID to begin at 1 or 0 (in case there's a header row).

I was able to run this against a file of 300K records in about 5 seconds.




回答3:


You might also be able to define an identity column in your table that auto-increments during data load. That way, you can sort on it later when you want the records in the same order again.




回答4:


If you can save the excel spreadsheet as a CSV it is very easy to generate a list of INSERT statements with any scripting language which will be executed in the exact same order as the spreadsheet. Here's a quick example in Groovy but any scripting language will do it just as easily if not easier:

def file1 = new File('c:\\temp\\yourSpreadsheet.csv')
def file2 = new File('c:\\temp\\yourInsertScript.sql')

def reader = new FileReader(file1)
def writer = new FileWriter(file2)

reader.transformLine(writer) { line ->
    fields =  line.split(',')

    text = """INSERT INTO table1 (col1, col2, col3) VALUES ('${fields[0]}', '${fields[1]}', '${fields[2]}');"""

}

You can then execute your "yourInsertScript.sql" against your database and your order will be the same as your spreadsheet.



来源:https://stackoverflow.com/questions/189694/how-to-keep-row-order-with-sqlbulkcopy

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!