Summing duplicate values while reading in data for different types of outputs

怎甘沉沦 提交于 2019-12-11 08:30:52

问题


I am reading in 5000 rows of data from a stream as follows from top to bottom and store it in a new CSV file.

ProductCode |Name   | Type  | Size | Price
ABC | Shoe  | Trainers  | 3 | 3.99
ABC | Shoe  | Trainers  | 3 | 4.99
ABC | Shoe  | Trainers  | 4 | 5.99 
ABC | Shoe  | Heels | 4 | 3.99
ABC | Shoe  | Heels | 5 | 4.99
ABC | Shoe  | Heels | 3 | 5.99
...

Instead of having duplicate entries, I want the CSV to have one row but with the Price summed: E.g. If I want a csv file with only ProductCode, Name and Type, ignoring the Size. I want it too look like this:

ProductCode |Name   | Type  | Price
ABC | Shoe  | Trainers  | 14.97
ABC | Shoe  | Heels | 14.97

Show only ProductCode, Name:

ProductCode |Name   | Price
ABC | Shoe  | 29.94

Show ProductCode, Name, Size, ignoring Type:

ProductCode |Name   | Type  | Size | Price
ABC | Shoe  | 3 | 14.97
ABC | Shoe  | 4 | 9.98
ABC | Shoe  | 5 | 4.99

I store each row with all fields as a Product and keep a list of all Products:

public class Product
    {
        public string ProductCode { get; set; }
        public string Name { get; set; }
        public string Type { get; set; }
        public string Price { get; set; }
    }

And then output the needed fields into the csv depending on the csvOutputType using ConvertToOutputFormat which is different for each Parser.

public class CodeNameParser : Parser {
    public override string ConvertToOutputFormat(Product p) {
         return string.Format("{0},{1},{2}", p.ProductCode, p.ProductName, p.Price);
    }
}

My code is then:

string fileName = Path.Combine(directory, string.Format("{0}.csv", name));            
switch (csvOutputType)
            {
                case (int)CodeName:
                    _parser = new CodeNameParser();
                    break;
                case (int)CodeType:
                    _parser = new CodeTypeParser();
                    break;
                case (int)CodeNameType:
                    _parser = new CodeNameTypeParser();
                    break;
}

var results = Parse(stream).ToList(); //Parse returns IEnumerable<Product>
if (results.Any())
            {
                using (var streamWriter = File.CreateText(fileName))
                {
                    //writes the header line out
                    streamWriter.WriteLine("{0},{1}", header, name);

                    results.ForEach(p => { streamWriter.WriteLine(_parser .ConvertToOutputFormat(p)); });
                    streamWriter.Flush();
                    streamWriter.Close();
                }

                Optional<string> newFileName = Optional.Of(SharpZipWrapper.ZipFile(fileName, RepositoryDirectory));
                //cleanup
                File.Delete(fileName);
                return newFileName;
            }

I don't want to go through the 5000 rows again to remove the duplicates but would like to check if the entry already exists before I add it to the csv file. I know that I can groupBy the required fields, but since I have 3 different outputs, I would have to write the same code 3 times for different keys that I need to group by.

results = results
    .GroupBy(p => new { p.ProductCode, p.Name, p.Type })
    .Select(g => new Product {
        ProductCode = g.Key.ProductCode,
        Name = g.Key.Name,
        Type = g.Key.Type,
        Price = g.Sum(p => p.Price)
    })
    .ToList();

Is there any other way to do this?

来源:https://stackoverflow.com/questions/30129141/summing-duplicate-values-while-reading-in-data-for-different-types-of-outputs

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!