Comparing 2 CSV files in C# advice?

六月ゝ 毕业季﹏ 提交于 2020-01-07 02:02:32

问题


I need to develop an application where two csv files are compared. The first file has a list of email addresses. The second list also has email addresses, but includes name and address info. The first list contains email addresses that need to be removed from the second list. I have the Fast CSV reader from the CodeProject site which works pretty well. The application will not have access to a database server. A new file wil be generated with data that is considered verified. Meaning, it will not contain any of the information from the first file.


回答1:


If you read both lists into collections, you can use Linq to determine the subset of addresses.

Here is a quick example class I whipped up for you.

using System;
using System.Linq;
using System.Collections.Generic;

public class RemoveExample
{
    public List<Item> RemoveAddresses(List<Item> sourceList, List<string> emailAddressesToRemove)
    {
        List<Item> newList = (from s in sourceList
                              where !emailAddressesToRemove.Contains(s.Email)
                              select s).ToList();
        return newList;
    }

    public class Item
    {
        public string Email { get; set; }
        public string Name { get; set; }
        public string Address { get; set; }
    }
}

To use it, read your csv into a List, then pass it, and your list of addresses to remove as a List into the method.




回答2:


Not sure what kind of advice you need, it sounds straight forward.

heres a quick algorithm sketch:

  • loop through email from first csv
    • put each email in a HashSet<>
  • run your delete
  • put each output email in the same HashSet<>
    • if there is a DuplicateKeyException, you missed one in the delete
    • if emailList2.Count - emailList1.Count != outputList.Count, you deleted too many



回答3:


This is relatively simple, assuming the lists aren't terribly large or memory usage isn't an overly large concern: Read both sets of emails addresses in two separate HashSet<string> instances. Then, you can use HashSet<T>.ExceptsWith to find the differences between the two sets. For instance:

HashSet<string> setA = ...;
HashSet<string> setB = ...;

setA.ExceptWith(setB); // Remove all strings in setB from setA

// Print all strings that were in setA, but not setB
foreach(var s in setA)
   System.Console.WriteLine(s);

BTW, the above should be O(n*log(n)) complexity, versus using the Linq answer, which would be O(n^2) on non-indexed data.



来源:https://stackoverflow.com/questions/3357195/comparing-2-csv-files-in-c-sharp-advice

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!