Efficient way to read a specific line number of a file. (BONUS: Python Manual Misprint)

前端 未结 5 690
醉梦人生
醉梦人生 2021-02-09 17:02

I have a 100 GB text file, which is a BCP dump from a database. When I try to import it with BULK INSERT, I get a cryptic error on line number 219506324. Before sol

5条回答
  •  猫巷女王i
    2021-02-09 17:27

    Here's my elegant version in C#:

    Console.Write(File.ReadLines(@"s:\source\transactions.dat").ElementAt(219506323));
    

    or more general:

    Console.Write(File.ReadLines(filename).ElementAt(linenumber - 1));
    

    Of course, you may want to show some context before and after the given line:

    Console.Write(string.Join("\n",
                  File.ReadLines(filename).Skip(linenumber - 5).Take(10)));
    

    or more fluently:

    File
    .ReadLines(filename)
    .Skip(linenumber - 5)
    .Take(10)
    .AsObservable()
    .Do(Console.WriteLine);
    

    BTW, the linecache module does not do anything clever with large files. It just reads the whole thing in, keeping it all in memory. The only exceptions it catches are I/O-related (can't access file, file not found, etc.). Here's the important part of the code:

        fp = open(fullname, 'rU')
        lines = fp.readlines()
        fp.close()
    

    In other words, it's trying to fit the whole 100GB file into 6GB of RAM! What the manual should say is maybe "This function will never throw an exception if it can't access the file."

提交回复
热议问题