Quickly replace first line of large file

前端 未结 2 885
孤独总比滥情好
孤独总比滥情好 2021-02-19 18:02

I have many large csv files (1-10 gb each) which I\'m importing into databases. For each file, I need to replace the 1st line so I can format the headers to be the column names.

相关标签:
2条回答
  • 2021-02-19 18:49

    If you can guarantee that fixedLine is the same length (or less) as line, you can update the files in-place instead of copying them.

    If not, you can possibly get a little performance improvement by accessing the .BaseStream of your StreamReader and StreamWriter and doing big block copies (using, say, a 32K byte buffer) to do the copying, which will at least eliminate the time spent checking every character to see if it's an end-of-line character as happens now with reader.ReadLine().

    0 讨论(0)
  • 2021-02-19 18:57

    The only thing that can significantly speed it up is if you can really replace first line. If new first line is no longer than old one - replace (with space padding if needed) the first line carefully.

    Otherwise - you have to create new file and copy the rest after first line. You may be able to optimize copying a bit by adjusting buffer sizes/explicit copy as binary/per-allocating size, but it will not change the fact that you need to copy whole file.

    One more cheat if you planning to drop CSV data into DB anyway: if order does not matter you can read some lines from the beginning, replace them with new header and add the removed lines to the end of the file.

    Side note: if this is one-time operation I'd simply copy files and be done with it... Debugging code that inserts data into middle of text file with potentially different encoding may not worth an effort.

    0 讨论(0)
提交回复
热议问题