LINQ conditional aggregation based on next elements' values

三世轮回 提交于 2019-12-10 15:37:55

问题


What's a good LINQ equivalent of this pesudo-code: "given a list of strings, for each string that doesn't contain a tab character, concatenate it (with a pipe delimiter) to the end of the previous string, and return the resulting sequence" ?

More Info:

I have a List<string> representing lines in a tab-delimited text file. The last field in each line is always a multiline text field, and the file was generated by a buggy system that mishandles fields with embedded newlines. So I end up with a list like this:

1235 \t This is Record 1
7897 \t This is Record 2
8977 \t This is Record 3
continued on the next line
and still continued more
8375 \t This is Record 4

I'd like to coalesce this list by concatenating all the orphan lines (lines with no tab characters) to the end of the previous line. Like this:

1235 \t This is Record 1
7897 \t This is Record 2
8977 \t This is Record 3|continued on the next line|and still continued more
8375 \t This is Record 4

Solving this with a for() loop would be easy, but I'm trying to improve my LINQ skills and I was wondering if there is a reasonably efficient LINQ solution to this problem. Is there?


回答1:


This is not a problem that should be solved with LINQ. LINQ is designed for enumeration, whereas this is best solved by iteration.

Enumerating a sequence properly means no item has knowledge of the other items, which obviously won't work in your case. Use a for loop so you can cleanly go through the strings one by one and in order.




回答2:


You could do something like this:

string result = records.Aggregate("", (current, s) => current + (s.Contains("\t") ? "\n" + s : "|" + s));

I cheated and got Resharper to generate this for me. This is close -- it leaves a blank line at the top though.

However, as you can see, this is not very readable. I realize you're looking for a learning exercise but I'd take a nice readable foreach loop over this any day.




回答3:


Just did for my curiosity.

var originalList = new List<string>
{
    "1235 \t This is Record 1",
    "7897 \t This is Record 2",
    "8977 \t This is Record 3",
    "continued on the next line",
    "and still continued more",
    "8375 \t This is Record 4"
};

var resultList = new List<string>();

resultList.Add(originalList.Aggregate((workingSentence, next) 
    => { 
            if (next.Contains("\t"))
            {
                resultList.Add(workingSentence);    
                return next;
            }
            else
            {
                workingSentence += "|" + next;
                return workingSentence;
            }
    }));

The resultList should contain what you want.

Please note that this is not an optimal solution. The line workingSentence += "|" + next; may create lots of temp objects depending on your data pattern.

An optimal solution may involve to keep multiple index variables to look ahead of strings and concatenate them when the next string contains a tab character instead of concatenating one by one as shown above. However, it will be more complex than the one above because of boundary checking and keeping multiple index variables :).

Update: The following solution will not create temporary string objects for concatenation.

var resultList = new List<string>();
var tempList = new List<string>();

tempList.Add(originalList.Aggregate((cur, next)
    => {
            tempList.Add(cur);
            if (next.Contains("\t"))
            {
                resultList.Add(string.Join("|", tempList));
                tempList.Clear();       
            }
            return next;
    }));

resultList.Add(string.Join("|", tempList));

The following is a solution using for loop.

var resultList = new List<string>();
var temp = new List<string>();
for(int i = 0, j = 1; j < originalList.Count; i++, j++)
{
    temp.Add(originalList[i]);
    if (j != originalList.Count - 1)
    {   
        if (originalList[j].Contains("\t"))
        {
            resultList.Add(string.Join("|", temp));
            temp.Clear();
        }
    }
    else // when originalList[j] is the last item
    {
        if (originalList[j].Contains("\t"))
        {
            resultList.Add(string.Join("|", temp));
            resultList.Add(originalList[j]);
        }
        else
        {
            temp.Add(originalList[j]);
            resultList.Add(string.Join("|", temp));
        }
    }
}



回答4:


After trying a for() solution, I tried a LINQ solution and came up with the one below. For my reasonably small (10K lines) file it was fast enough that I didn't care about the efficiency, and I found it much more readable than the equivalent for() solution.

var lines = new List<string>      
{      
    "1235 \t This is Record 1",      
    "7897 \t This is Record 2",      
    "8977 \t This is Record 3",      
    "continued on the next line",      
    "and still continued more",      
    "8375 \t This is Record 4"      
};  
var fixedLines = lines
        .Select((s, i) => new 
            { 
                Line = s, 
                Orphans = lines.Skip(i + 1).TakeWhile(s2 => !s2.Contains('\t')) 
            })
        .Where(s => s.Line.Contains('\t'))
        .Select(s => string.Join("|", (new string[] { s.Line }).Concat(s.Orphans).ToArray()))


来源:https://stackoverflow.com/questions/9935150/linq-conditional-aggregation-based-on-next-elements-values

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!