问题
I have built an SSIS package that loads in several delimited text files into a SQL database. One of the files often contains line spaces in it, which breaks the standard data flow task of setting a flat file source and mapping to an ado.net destination since it thinks it is on a new line when it reaches a line break. The vendor sending over the files does not want to sent the file without any edits and can't do XML at this time. Is there any way to fix this? I was thinking of writing a small vb.net program that would correct the files so they would work in the SSIS package, but not sure how to write that logic. The file has 5 columns, the first 2 are big integer and always contain some long integer ID, then there is a small text column that just contains one short word, then a date, and then a long comments field that is causing the problem. The comments field is sometimes blank (which is ok), the problem are the rows that have line breaks. I never know how many line breaks are in the comments, some have none, some can have several, even multiple line breaks in a row, so was wondering if this is even possible.
5787626|6547599|Approved|1/10/2017|Applicant request for fee waiver approved 5443221|7742812|Active|11/5/2013| 3430962|7643957|Re-Scheduled|5/25/2016|REVISED TERMS AND CONDITIONS REJECTED Applicant has 30 DAYS To submit paperwork for extension. 34433624|7673715|Denied|1/24/2017| 34113575|7653748|Active|1/8/2014|New terms have been granted.
Sample File Format.
回答1:
As long as there is logic that you can program/predict, it will be possible.
I would do it using a Script Component as a source, which means you don't need to rewrite the file before processing it. It also provides a lot of flexibility, e.g., you can store values in variables while iterating over multiple lines in the file, etc.
I posted another answer recently that gives a lot of detail on how to go about this: SSIS import a Flat File to SQL with the first row as header and last row as a total.
An example of holding the values in variables until the row is ready to be written:-
For this example I am writing three columns, ID1, ID2 and Comments. The file looks like this:
1|2|Comment1
Comment2
4|5|Comment3
Comment4
Comment5
6|7|Comment6
The Script Component contains the following method.
public override void CreateNewOutputRows()
{
System.IO.StreamReader reader = null;
try
{
bool readFirstLine = false;
int id1 = 0;
int id2 = 0;
string comments = null;
reader = new System.IO.StreamReader(Variables.FilePath); // this refers to a package variable that contains the file path
while (!reader.EndOfStream)
{
string line = reader.ReadLine();
if (line.Contains("|"))
{
if (readFirstLine)
{
Output0Buffer.AddRow();
Output0Buffer.ID1 = id1;
Output0Buffer.ID2 = id2;
Output0Buffer.Comments = comments;
}
else
{
readFirstLine = true;
}
string[] fields = line.Split('|');
id1 = Convert.ToInt32(fields[0]);
id2 = Convert.ToInt32(fields[1]);
comments = fields[2];
}
else
{
comments += " " + line;
}
if (reader.EndOfStream)
{
Output0Buffer.AddRow();
Output0Buffer.ID1 = id1;
Output0Buffer.ID2 = id2;
Output0Buffer.Comments = comments;
}
}
}
catch
{
if (reader != null)
{
reader.Close();
reader.Dispose();
}
throw;
}
}
The result set is:
ID1 ID2 Comments
=== === ========
1 2 Comment1 Comment2
4 5 Comment3 Comment4 Comment5
6 7 Comment6
来源:https://stackoverflow.com/questions/46779816/how-can-i-load-in-a-pipe-delimited-text-file-that-has-columns-that-sometimes