PHP Using fgetcsv on a huge csv file

前端 未结 3 731
失恋的感觉
失恋的感觉 2021-01-06 07:58

Using fgetcsv, can I somehow do a destructive read where rows I\'ve read and processed would be discarded so if I don\'t make it through the wh

相关标签:
3条回答
  • 2021-01-06 08:38

    From your problem description it really sounds like you need to switch hosts. Processing a 2 GB file with a hard time limit is not a very constructive environment. Having said that, deleting read lines from the file is even less constructive, since you would have to rewrite the entire 2 GB to disk minus the part you have already read, which is incredibly expensive.

    Assuming you save how many rows you have already processed, you can skip rows like this:

    $alreadyProcessed = 42; // for example
    
    $i = 0;
    while ($row = fgetcsv($fileHandle)) {
        if ($i++ < $alreadyProcessed) {
            continue;
        }
    
        ...
    }
    

    However, this means you're reading the entire 2 GB file from the beginning each time you go through it, which in itself already takes a while and you'll be able to process fewer and fewer rows each time you start again.

    The best solution here is to remember the current position of the file pointer, for which ftell is the function you're looking for:

    $lastPosition = file_get_contents('last_position.txt');
    $fh = fopen('my.csv', 'r');
    fseek($fh, $lastPosition);
    
    while ($row = fgetcsv($fh)) {
        ...
    
        file_put_contents('last_position.txt', ftell($fh));
    }
    

    This allows you to jump right back to the last position you were at and continue reading. You obviously want to add a lot of error handling here, so you're never in an inconsistent state no matter which point your script is interrupted at.

    0 讨论(0)
  • 2021-01-06 08:48

    You can avoid timeout and memory error to some extent when reading like a Stream. By Reading line by line and then inserts each line into a database (Or Process accordingly). In that way only single line is hold in memory on each iteration. Please note don't try to load a huge csv-file into an array, that really would consume a lot of memory.

    if(($handle = fopen("yourHugeCSV.csv", 'r')) !== false)
    {
        // Get the first row (Header)
        $header = fgetcsv($handle);
    
        // loop through the file line-by-line
        while(($data = fgetcsv($handle)) !== false)
        {
            // Process Your Data
            unset($data);
        }
        fclose($handle);
    }
    
    0 讨论(0)
  • 2021-01-06 08:52

    I think a better solution (it will be phenomnally inefficient to continuously rewind and write to open file stream) would be to track the file position of each record read (using ftell) and store it with the data you've read - then if you have to resume, then just fseek to the last position.

    You could try loading the file directly using mysql's read file function (which will likely be a lot faster) although I've had problems with this in the past and ended up writing my own php code.

    I have a hard timeout on the server set to 180 seconds by the hosting provider, and max memory utilization limit of 128mb for any single script. These limits cannot be changed by me.

    What have you tried?

    The memory can be limited by other means than the php.ini file, but I can't imagine how anyone could actually prevent you from using a different execution time (even if ini_set is disabled, from the command line you could run php -d max_execution_time=3000 /your/script.php or php -c /path/to/custom/inifile /your/script.php )

    Unless you are trying to fit the entire datafile into memory then there should be no issue with a memory limit of 128Mb

    0 讨论(0)
提交回复
热议问题