问题
Does anyone know how to split this file
1 TESTAAA SERNUM A DESCRIPTION
2 TESTBBB ANOTHR ANOTHER DESCRIPTION
3 TESTXXX BLAHBL
Each column has a fixed width and I'm planning to do it with a regex but I don't know how to do it exactly.
Having
{id} {firsttext} {serialhere} {description}
4 22 6 30+
Someone recommend with a pattern like this (.{4})(.{22})(.{6})(.+)? then split it by split(' ') but the user stated that this won't work with a column has no value, but even that, he didn't do any example.
I heard also about the TextFieldParser but it has some issues about performance.
Can anyone tell me how to split by fixed width?
Thanks.
回答1:
Without seeing any reason not to, I would probably just use Substring.
Having said that, the Regex should work too.
The following example works on the input shown (rather than the numbers you've given) and assumes serial number is a required field, but may not take up its entire length + description is optional. Make adjustments following that principle if these assumptions are incorrect.
string input = @"1 TESTAAA SERNUM A DESCRIPTION
2 TESTBBB ANOTHR ANOTHER DESCRIPTION
3 TESTXXX BLAHBL";
var split = input.Split('\n').Select(s => new {
Id = s.Substring(0, 2),
FirstText = s.Substring(2, 13),
Serial = s.Substring(15, Math.Min(s.Length-15, 10)),
Description = s.Length > 25 ? s.Substring(25) : String.Empty
});
Or as an explanatory example with more obvious naming and a slightly clearer example for serial length:
int idStart = 0;
int idLength = 2;
int firstTextStart = idStart + idLength;
int firstTextLength = 13;
int serialStart = firstTextStart + firstTextLength;
int serialLength = 10;
int descriptionStart = serialStart + serialLength;
var verboseSplit = input.Split('\n').Select(s => new {
Id = s.Substring(idStart, idLength),
FirstText = s.Substring(firstTextStart, firstTextLength),
Serial = s.Length > descriptionStart
? s.Substring(serialStart, serialLength)
: s.Substring(serialStart)
Description = s.Length > descriptionStart
? s.Substring(descriptionStart)
: String.Empty
});
The output from either:
Id FirstText Serial Description
1 TESTAAA SERNUM A DESCRIPTION
2 TESTBBB ANOTHR ANOTHER DESCRIPTION
3 TESTXXX BLAHBL
回答2:
Based on your sample try this, between each item there is a single white space
{id} {firsttext} {serialhere} {description}
4 22 6 30+
string target = "1 TESTAAA SERNUM A DESCRIPTION";
List<string> result = new List<string>(Regex.Split(target, @"(.{4})(.{1})(.{22})(.{1})(.{6})(.{1})(.+)?", RegexOptions.Singleline));
回答3:
How about this functional approach?
Start with these arrays:
var lines = new []
{
"1 TESTAAA SERNUM A DESCRIPTION",
"2 TESTBBB ANOTHR ANOTHER DESCRIPTION",
"3 TESTXXX BLAHBL",
};
var splits = new [] { 2, 13, 10, };
The splits
I've used are different from your question because the length of the fields in each sample line does't match your splits.
Now define a recursive function to do the splitting of each line:
Func<string, IEnumerable<int>, IEnumerable<string>> f = null;
f =
(t, ns) =>
{
if (ns.Any())
{
var n = ns.First();
var i = System.Math.Min(n, t.Length);
var t0 = t.Substring(0, i);
var t1 = t.Substring(i);
return new [] { t0.Trim(), }.Concat(f(t1, ns.Skip(1)));
}
else
return new [] { t.Trim(), };
};
Finally we can write a fairly trivial linq query to pull it all together:
var query =
from line in lines
let fields = f(line, splits).ToArray()
select new
{
id = fields[0],
firsttext = fields[1],
serialhere = fields[2],
description = fields[3],
};
The result I get is:
来源:https://stackoverflow.com/questions/19649617/how-to-split-a-text-lines-by-fixed-width-c-sharp