How to read a csv file one line at a time and replace/edit certain lines as you go?

前端 未结 3 1632
盖世英雄少女心
盖世英雄少女心 2021-02-02 01:41

I have a 60GB csv file I need to make some modifications to. The customer wants some changes to the files data, but I don\'t want to regenerate the data in that file because it

相关标签:
3条回答
  • 2021-02-02 01:55

    memory-mapped files is a new feature in .NET Framework 4 which can be used to edit large files. read here http://msdn.microsoft.com/en-us/library/dd997372.aspx or google Memory-mapped files

    0 讨论(0)
  • 2021-02-02 02:13

    The process would be something like this:

    1. Open a StreamWriter to a temporary file.
    2. Open a StreamReader to the target file.
    3. For each line:
      1. Split the text into columns based on a delimiter.
      2. Check the columns for the values you want to replace, and replace them.
      3. Join the column values back together using your delimiter.
      4. Write the line to the temporary file.
    4. When you are finished, delete the target file, and move the temporary file to the target file path.

    Note regarding Steps 2 and 3.1: If you are confident in the structure of your file and it is simple enough, you can do all this out of the box as described (I'll include a sample in a moment). However, there are factors in a CSV file that may need attention (such as recognizing when a delimiter is being used literally in a column value). You can drudge through this yourself, or try an existing solution.


    Basic example just using StreamReader and StreamWriter:

    var sourcePath = @"C:\data.csv";
    var delimiter = ",";
    var firstLineContainsHeaders = true;
    var tempPath = Path.GetTempFileName();
    var lineNumber = 0;
    
    var splitExpression = new Regex(@"(" + delimiter + @")(?=(?:[^""]|""[^""]*"")*$)");
    
    using (var writer = new StreamWriter(tempPath))
    using (var reader = new StreamReader(sourcePath))
    {
        string line = null;
        string[] headers = null;
        if (firstLineContainsHeaders)
        {
            line = reader.ReadLine();
            lineNumber++;
    
            if (string.IsNullOrEmpty(line)) return; // file is empty;
    
            headers = splitExpression.Split(line).Where(s => s != delimiter).ToArray();
    
            writer.WriteLine(line); // write the original header to the temp file.
        }
    
        while ((line = reader.ReadLine()) != null)
        {
            lineNumber++;
    
            var columns = splitExpression.Split(line).Where(s => s != delimiter).ToArray();
    
            // if there are no headers, do a simple sanity check to make sure you always have the same number of columns in a line
            if (headers == null) headers = new string[columns.Length];
    
            if (columns.Length != headers.Length) throw new InvalidOperationException(string.Format("Line {0} is missing one or more columns.", lineNumber));
    
            // TODO: search and replace in columns
            // example: replace 'v' in the first column with '\/': if (columns[0].Contains("v")) columns[0] = columns[0].Replace("v", @"\/");
    
            writer.WriteLine(string.Join(delimiter, columns));
        }
    
    }
    
    File.Delete(sourcePath);
    File.Move(tempPath, sourcePath);
    
    0 讨论(0)
  • 2021-02-02 02:22

    Just read the file, line by line, with streamreader, and then use REGEX! The most amazing tool in the world.

    using (var sr = new StreamReader(new FileStream(@"C:\temp\file.csv", FileMode.Open)))
            {
                var line = sr.ReadLine();
                while (!sr.EndOfStream)
                {
                    // do stuff
    
                    line = sr.ReadLine();
                }
    
            }
    
    0 讨论(0)
提交回复
热议问题