问题
I am reading in a 17-column CSV file into a database. once in a while the file has a "less then 17-column" row. I am trying to ignore the row, but even when all columns are set to ignore, I can't ignore that row and the package fails.
How to ignore those rows?
回答1:
Solution Overview
you can do this by adding one Flat File Connection Manager
add only one column with Data type DT_WSTR
and a length of 4000
(assuming it's name is Column0
) - So all column are considered as one big column
- In the
Dataflow task
add aScript Component
after theFlat File Source
- In mark
Column0
as Input Column and Add 17 Output Columns - In the
Input0_ProcessInputRow
method splitColumn0
by delimiter, Then check if the length of array is = 17 then assign values to output columns, Else ignore the row.
Detailed Solution
- Add a Flat file connection manager, Select the text file
- Go to the Advanced Tab, Delete all Columns except one Column
- Change the datatype of the remianing Column to
DT_WSTR
and length =4000
- Add a DataFlow Task
- Inside the Data Flow Task add a Flat File Source, Script Component and OLEDB Destination
- In the Script Component Select
Column0
as Input Column
- Add 17 Output Columns (the optimal output columns)
- Change the
OutputBuffer
SynchronousInput
property toNone
- Select the Script Language to
Visual Basic
In the Script Editor write the following Script
Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer) If Not Row.Column0_IsNull AndAlso Not String.IsNullOrEmpty(Row.Column0.Trim) Then Dim strColumns As String() = Row.Column0.Split(CChar(";")) If strColumns.Length <> 17 Then Exit Sub Output0Buffer.AddRow() Output0Buffer.Column = strColumns(0) Output0Buffer.Column1 = strColumns(1) Output0Buffer.Column2 = strColumns(2) Output0Buffer.Column3 = strColumns(3) Output0Buffer.Column4 = strColumns(4) Output0Buffer.Column5 = strColumns(5) Output0Buffer.Column6 = strColumns(6) Output0Buffer.Column7 = strColumns(7) Output0Buffer.Column8 = strColumns(8) Output0Buffer.Column9 = strColumns(9) Output0Buffer.Column10 = strColumns(10) Output0Buffer.Column11 = strColumns(11) Output0Buffer.Column12 = strColumns(12) Output0Buffer.Column13 = strColumns(13) Output0Buffer.Column14 = strColumns(14) Output0Buffer.Column15 = strColumns(15) Output0Buffer.Column16 = strColumns(16) End If End Sub
Map the Output Columns to the Destination Columns
回答2:
C# Solution for Loading CSV and skip rows that don't have 17 columns:
Use a Script Component: On input/output screen add all of your outputs with data types.
string fName = @"C:\test.csv" // Full file path: it should reference via variable
string[] lines = System.IO.File.ReadAllLines(fName);
//add a counter
int ctr = 1;
foreach(string line in lines)
{
string[] cols = line.Split(',');
if(ctr!=1) //Assumes Header row. elim if 1st row has data
{
if(cols.Length == 17)
{
//Write out to Output
Output0Buffer.AddRow();
Output0Buffer.Col1 = cols[0].ToString(); //You need to cast to data type
Output0Buffer.Col2 = int.Parse(cols[1]) // example to cast to int
Output0Buffer.Col3 = DateTime.Parse(cols[2]) // example of datetime
... //rest of Columns
}
//optional else to handle skipped lines
//else
// write out line somewhere
}
ctr++; //increment counter
}
来源:https://stackoverflow.com/questions/48056021/how-to-skip-a-bad-row-in-ssis-flat-file-source