BULK INSERT with inconsistent number of columns

前端 未结 5 1484
名媛妹妹
名媛妹妹 2020-12-10 07:21

I am trying to load a large amount data in SQL server from a flat file using BULK INSERT. However, my file has varying number of columns, for instance the first row contains

相关标签:
5条回答
  • 2020-12-10 08:03

    Another workaround is to preprocess the file. It may be easier to write a small standalone program to add terminators to each line so it can be BULK loaded properly than to parse the lines using T-SQL.

    Here's one example in VB6/VBA. It's certainly not as fast as the SQL Server bulk insert, but it just preprocessed 91000 rows in 10 seconds.

    Sub ColumnDelimiterPad(FileName As String, OutputFileName As String, ColumnCount As Long, ColumnDelimiter As String, RowDelimiter As String)
       Dim FileNum As Long
       Dim FileData As String
    
       FileNum = FreeFile()
       Open FileName For Binary Access Read Shared As #FileNum
       FileData = Space$(LOF(FileNum))
       Debug.Print "Reading File " & FileName & "..."
       Get #FileNum, , FileData
       Close #FileNum
    
       Dim Patt As VBScript_RegExp_55.RegExp
       Dim Matches As VBScript_RegExp_55.MatchCollection
    
       Set Patt = New VBScript_RegExp_55.RegExp
       Patt.IgnoreCase = True
       Patt.Global = True
       Patt.MultiLine = True
       Patt.Pattern = "[^" & RowDelimiter & "]+"
       Debug.Print "Parsing..."
       Set Matches = Patt.Execute(FileData)
    
       Dim FileLines() As String
       Dim Pos As Long
       Dim MissingDelimiters
    
       ReDim FileLines(Matches.Count - 1)
       For Pos = 0 To Matches.Count - 1
          If (Pos + 1) Mod 10000 = 0 Then Debug.Print Pos + 1
          FileLines(Pos) = Matches(Pos).Value
          MissingDelimiters = ColumnCount - 1 - Len(FileLines(Pos)) + Len(Replace(FileLines(Pos), ColumnDelimiter, ""))
          If MissingDelimiters > 0 Then FileLines(Pos) = FileLines(Pos) & String(MissingDelimiters, ColumnDelimiter)
       Next
       If (Pos + 1) Mod 10000 <> 0 Then Debug.Print Pos + 1
    
       If Dir(OutputFileName) <> "" Then Kill OutputFileName
       Open OutputFileName For Binary Access Write Lock Read Write As #FileNum
       Debug.Print "Writing " & OutputFileName & "..."
       Put #FileNum, , Join(FileLines, RowDelimiter)
       Close #FileNum
       Debug.Print "Done."
    End Sub
    
    0 讨论(0)
  • 2020-12-10 08:05

    The varying number of columns means it can't be parsed by the bulk insert code. How does it know the correct number of columns? What if you supply too many?

    You'll have to upload it to a table with 4 columns, and split out the rest later (or one big column) Or pre-process it to generate an equal number of columns.

    0 讨论(0)
  • 2020-12-10 08:10

    BULK INSERT isn't particularly flexible. One work-around is to load each row of data into an interim table that contains a single big varchar column. Once loaded, you then parse each row using your own routines.

    0 讨论(0)
  • 2020-12-10 08:13

    Try specifying a ROW terminator along with your field terminator.

    BULK INSERT #t 
    FROM '<path to file>' 
    WITH  
    ( 
      DATAFILETYPE = 'char', 
      KEEPNULLS, 
      FIELDTERMINATOR = '#',
      ROWTERMINATOR = '\n' --Or whatever signifies the end of a row in your flatfile.
    ) 
    

    More info on this can be found here:

    http://msdn.microsoft.com/en-us/library/ms191485.aspx

    0 讨论(0)
  • 2020-12-10 08:20

    My workaround (tested in T-SQL):

    1. Create table with colum count = minimum column count of your import file
    2. Run bulk insert (it will succeed now)

    In last table column, you will find all rest items (including your item separator)

    If it is necessery for you, create another full-columned table, copy all columns from first table, and do some parsing only over last column.

    Example file

    alpha , beta , gamma
    one   , two  , three , four
    

    will look like this in your table:

    c1      | c2     | c3
    "alpha" | "beta" | "gamma"
    "one"   | "two"  | "three , four"
    
    0 讨论(0)
提交回复
热议问题