I am working with SSIS and I need to load multiple files with the following (Yellos) format to SQL using SSIS
The problem as you can see is that the files has an horrible format only process / consume records if the column A is populated (e.g: ignoring rows# 14 - X ) and I need to insert the value in D1 into the Date column.
any suggestion?
Regards!
Lets divide this problem into 3 Sub problems:
- Get the date value from
D1
- Start Reading from Row number 4
- Ignore all Rows where Column1 is NULL
Solution
1. Get the date value from D1
- Create 2 SSIS variables,
@[User::FilePath]
(of type string) that contains the excel file path,@[User::FileDate]
(of type string) that we will use it to store the date value - Add a script Task, choose the script language as Visual Basic
- Select
@[User::FilePath]
as a ReadOnly variable and@[User::FileDate]
as a ReadWrite variable - Open the Script Editor and use the following code to retrieve the Date Value and store it into
@[User::FileDate]
This will search for the sheet named Refunds
and extract the date value from it and store this value into @[User::FileDate]
m_strExcelPath = Dts.Variables.Item("FilePath").Value.ToString
Dim strSheetname As String = String.Empty
Dim strDate as String = String.Empty
m_strExcelConnectionString = Me.BuildConnectionString()
Try
Using OleDBCon As New OleDbConnection(m_strExcelConnectionString)
If OleDBCon.State <> ConnectionState.Open Then
OleDBCon.Open()
End If
'Get all WorkSheets
m_dtschemaTable = OleDBCon.GetOleDbSchemaTable(OleDbSchemaGuid.Tables,
New Object() {Nothing, Nothing, Nothing, "TABLE"})
'Loop over work sheet to get the first one (the excel may contains temporary sheets or deleted ones
For Each schRow As DataRow In m_dtschemaTable.Rows
strSheetname = schRow("TABLE_NAME").ToString
If Not strSheetname.EndsWith("_") AndAlso strSheetname.EndsWith("$") Then
If Not strSheetname.Tolower.Contains("refunds") Then Continue For
Using cmd As New OleDbCommand("SELECT * FROM [" & strSheetname & "A1:D1]", OleDBCon)
Dim dtTable As New DataTable("Table1")
cmd.CommandType = CommandType.Text
Using daGetDataFromSheet As New OleDbDataAdapter(cmd)
daGetDataFromSheet.Fill(dtTable)
'Get Value from column 4 (3 because it is a zero-based index
strDate = dtTable.Rows(0).Item(3).ToString
End Using
End Using
'when the first correct sheet is found there is no need to check others
Exit For
End If
Next
OleDBCon.Close()
End Using
Catch ex As Exception
Throw New Exception(ex.Message, ex)
End Tr
Dts.Variables.Item("FileDate").Value = strDate
Dts.TaskResult = ScriptResults.Success
End Sub
In the DataFlow Task add a
Derived Column Transformation
, add a derived column with the following expression@[User::FileDate]
2. Start Reading from Row Number 4
As we assumed that the Excel File Path is stored in @[User::FilePath]
- First open the
Excel Connection Manager
and uncheck the boxFirst row has column names
- In the DataFlow Task, double click on the excel source
- Set the source to
SQL Command
- Use the following command:
SELECT * FROM [Refunds$A4:D]
, so it will start reading from the row number 4 - Columns names will be as the following F1 ... F4 , in the excel source you can go to the Columns Tab and give alias to the columns names, so in the data flow task they will be showed with their aliases
3. Ignore all Rows Where Column1 is NULL
- Add a conditional split after the Excel Source
Split the Flow based on the following expression
ISNULL([F1]) == False
If you didn't give an alias to F1
otherwise use the alias
Finally, remember that you must add a derived column (as we said in the first sub-problem) that contains the date value
来源:https://stackoverflow.com/questions/47517548/load-multiple-data-from-excel-to-sql-ssis