问题
I am reading in 5000 rows of data from a stream
as follows from top to bottom and store it in a new CSV file.
ProductCode |Name | Type | Size | Price
ABC | Shoe | Trainers | 3 | 3.99
ABC | Shoe | Trainers | 3 | 4.99
ABC | Shoe | Trainers | 4 | 5.99
ABC | Shoe | Heels | 4 | 3.99
ABC | Shoe | Heels | 5 | 4.99
ABC | Shoe | Heels | 3 | 5.99
...
Instead of having duplicate entries, I want the CSV to have one row but with the Price summed: E.g. If I want a csv file with only ProductCode, Name and Type, ignoring the Size. I want it too look like this:
ProductCode |Name | Type | Price
ABC | Shoe | Trainers | 14.97
ABC | Shoe | Heels | 14.97
Show only ProductCode, Name:
ProductCode |Name | Price
ABC | Shoe | 29.94
Show ProductCode, Name, Size, ignoring Type:
ProductCode |Name | Type | Size | Price
ABC | Shoe | 3 | 14.97
ABC | Shoe | 4 | 9.98
ABC | Shoe | 5 | 4.99
I store each row with all fields as a Product
and keep a list of all Product
s:
public class Product
{
public string ProductCode { get; set; }
public string Name { get; set; }
public string Type { get; set; }
public string Price { get; set; }
}
And then output the needed fields into the csv depending on the csvOutputType
using ConvertToOutputFormat
which is different for each Parser.
public class CodeNameParser : Parser {
public override string ConvertToOutputFormat(Product p) {
return string.Format("{0},{1},{2}", p.ProductCode, p.ProductName, p.Price);
}
}
My code is then:
string fileName = Path.Combine(directory, string.Format("{0}.csv", name));
switch (csvOutputType)
{
case (int)CodeName:
_parser = new CodeNameParser();
break;
case (int)CodeType:
_parser = new CodeTypeParser();
break;
case (int)CodeNameType:
_parser = new CodeNameTypeParser();
break;
}
var results = Parse(stream).ToList(); //Parse returns IEnumerable<Product>
if (results.Any())
{
using (var streamWriter = File.CreateText(fileName))
{
//writes the header line out
streamWriter.WriteLine("{0},{1}", header, name);
results.ForEach(p => { streamWriter.WriteLine(_parser .ConvertToOutputFormat(p)); });
streamWriter.Flush();
streamWriter.Close();
}
Optional<string> newFileName = Optional.Of(SharpZipWrapper.ZipFile(fileName, RepositoryDirectory));
//cleanup
File.Delete(fileName);
return newFileName;
}
I don't want to go through the 5000 rows again to remove the duplicates but would like to check if the entry already exists before I add it to the csv file. I know that I can groupBy the required fields, but since I have 3 different outputs, I would have to write the same code 3 times for different keys that I need to group by.
results = results
.GroupBy(p => new { p.ProductCode, p.Name, p.Type })
.Select(g => new Product {
ProductCode = g.Key.ProductCode,
Name = g.Key.Name,
Type = g.Key.Type,
Price = g.Sum(p => p.Price)
})
.ToList();
Is there any other way to do this?
来源:https://stackoverflow.com/questions/30129141/summing-duplicate-values-while-reading-in-data-for-different-types-of-outputs