Best practices for importing large CSV files

前端 未结 10 821
攒了一身酷
攒了一身酷 2020-12-14 16:39

My company gets a set of CSV files full of bank account info each month that I need to import into a database. Some of these files can be pretty big. For example, one is abo

相关标签:
10条回答
  • 2020-12-14 16:54

    I don't like some of the other answers :)

    I used to do this at a job.

    You write a program to create a big SQL script full of INSERT statements, one per line. Than you run the script. You can save the script for future reference (cheap log). Use gzip and it will shrink the size like 90%.

    You don't need any fancy tools and it really doesn't matter what database you are using.

    You can do a few hundred Inserts per transaction or all of them in one transaction, it's up to you.

    Python is a good language for this, but I'm sure php is fine too.

    If you have performance problems some databases like Oracle have a special bulk loading program which is faster than INSERT statements.

    You should run out of memory cause you should only be parsing one line at a time. You have no need to hold the whole thing in memory, don't do that!

    0 讨论(0)
  • 2020-12-14 16:57

    If you are using Sql Server and have access to .NET then you can write a quick application to use the SQLBulkCopy class. I've used this in previous projects to get a lot of data into SQL very quickly. The SQLBulkCopy class makes use of SQL Server's BCP, so if you're using something other than .NET it may be worth looking into whether that option is open to you too. Not sure if you're using a DB other than SQL Server.

    0 讨论(0)
  • 2020-12-14 17:03

    FWIW the following steps caused a huge speedup of my LOAD DATA INFILE:

    SET FOREIGN_KEY_CHECKS = 0;
    SET UNIQUE_CHECKS = 0;
    SET SESSION tx_isolation='READ-UNCOMMITTED';
    SET sql_log_bin = 0;
    #LOAD DATA LOCAL INFILE....
    SET UNIQUE_CHECKS = 1;
    SET FOREIGN_KEY_CHECKS = 1;
    SET SESSION tx_isolation='READ-REPEATABLE';
    

    See article here

    0 讨论(0)
  • 2020-12-14 17:03

    You can use generator for memory efficient file ready. The small snippet below might help you.

    #Method
    public function getFileRecords($params)
    {
        $fp = fopen('../' . $params['file'] . '.csv', 'r');
        //$header = fgetcsv($fp, 1000, ','); // skip header
    
        while (($line = fgetcsv($fp, 1000, ',')) != FALSE) {
            $line = array_map(function($str) {
                return str_replace('\N', '', $str);
            }, $line);
    
            yield $line;
        }
    
        fclose($fp);
    
        return;
    }
    
    #Implementation
    foreach ($yourModel->getFileRecords($params) as $row) {
        // you get row as an assoc array;
        $yourModel->save($row);
    }
    
    0 讨论(0)
  • 2020-12-14 17:06

    You can use Mysql LOAD DATA INFILE statemnt, it allows you to read data from a text file and import the file's data into a database table very fast..

    LOAD DATA INFILE '/opt/lampp/htdocs/sample.csv' INTO TABLE discounts FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n' IGNORE 1 ROWS (title,@expired_date,discount) SET expired_date = STR_TO_DATE(@expired_date, '%m/%d/%Y');

    for more info: http://dev.mysql.com/doc/refman/5.5/en/load-data.html and http://www.mysqltutorial.org/import-csv-file-mysql-table/

    0 讨论(0)
  • 2020-12-14 17:06

    I am reading a CSV file which has close to 1M records and 65 columns. Each 1000 record processed in PHP, there is one big fat MySQL statement that goes into the database. The writing takes no time at all. It's the parsing that does. The memory used to process this uncompressed 600MB file is about 12 MB.

    0 讨论(0)
提交回复
热议问题