I am trying to load a large amount data in SQL server from a flat file using BULK INSERT. However, my file has varying number of columns, for instance the first row contains
Another workaround is to preprocess the file. It may be easier to write a small standalone program to add terminators to each line so it can be BULK loaded properly than to parse the lines using T-SQL.
Here's one example in VB6/VBA. It's certainly not as fast as the SQL Server bulk insert, but it just preprocessed 91000 rows in 10 seconds.
Sub ColumnDelimiterPad(FileName As String, OutputFileName As String, ColumnCount As Long, ColumnDelimiter As String, RowDelimiter As String)
Dim FileNum As Long
Dim FileData As String
FileNum = FreeFile()
Open FileName For Binary Access Read Shared As #FileNum
FileData = Space$(LOF(FileNum))
Debug.Print "Reading File " & FileName & "..."
Get #FileNum, , FileData
Close #FileNum
Dim Patt As VBScript_RegExp_55.RegExp
Dim Matches As VBScript_RegExp_55.MatchCollection
Set Patt = New VBScript_RegExp_55.RegExp
Patt.IgnoreCase = True
Patt.Global = True
Patt.MultiLine = True
Patt.Pattern = "[^" & RowDelimiter & "]+"
Debug.Print "Parsing..."
Set Matches = Patt.Execute(FileData)
Dim FileLines() As String
Dim Pos As Long
Dim MissingDelimiters
ReDim FileLines(Matches.Count - 1)
For Pos = 0 To Matches.Count - 1
If (Pos + 1) Mod 10000 = 0 Then Debug.Print Pos + 1
FileLines(Pos) = Matches(Pos).Value
MissingDelimiters = ColumnCount - 1 - Len(FileLines(Pos)) + Len(Replace(FileLines(Pos), ColumnDelimiter, ""))
If MissingDelimiters > 0 Then FileLines(Pos) = FileLines(Pos) & String(MissingDelimiters, ColumnDelimiter)
Next
If (Pos + 1) Mod 10000 <> 0 Then Debug.Print Pos + 1
If Dir(OutputFileName) <> "" Then Kill OutputFileName
Open OutputFileName For Binary Access Write Lock Read Write As #FileNum
Debug.Print "Writing " & OutputFileName & "..."
Put #FileNum, , Join(FileLines, RowDelimiter)
Close #FileNum
Debug.Print "Done."
End Sub
The varying number of columns means it can't be parsed by the bulk insert code. How does it know the correct number of columns? What if you supply too many?
You'll have to upload it to a table with 4 columns, and split out the rest later (or one big column) Or pre-process it to generate an equal number of columns.
BULK INSERT isn't particularly flexible. One work-around is to load each row of data into an interim table that contains a single big varchar column. Once loaded, you then parse each row using your own routines.
Try specifying a ROW terminator along with your field terminator.
BULK INSERT #t
FROM '<path to file>'
WITH
(
DATAFILETYPE = 'char',
KEEPNULLS,
FIELDTERMINATOR = '#',
ROWTERMINATOR = '\n' --Or whatever signifies the end of a row in your flatfile.
)
More info on this can be found here:
http://msdn.microsoft.com/en-us/library/ms191485.aspx
My workaround (tested in T-SQL):
In last table column, you will find all rest items (including your item separator)
If it is necessery for you, create another full-columned table, copy all columns from first table, and do some parsing only over last column.
Example file
alpha , beta , gamma
one , two , three , four
will look like this in your table:
c1 | c2 | c3
"alpha" | "beta" | "gamma"
"one" | "two" | "three , four"