Working with file pointers on a csv

大兔子大兔子 提交于 2019-11-28 09:45:20

问题


I was wondering how to change the code below to read x number of lines only process the sql insert statement then continue to read the file by x number and process until end of file. I am new to the idea of file pointers but i understand it should be possible using fgets.

I'm hoping to change the below code into a function where I can pass the filename and the number of lines I want read and processed.

I currently have : (from here)

$handle = fopen(dirname(__FILE__)."/files/workorderstest.csv" , "r");

$batch++;

if ($handle) {
    $counter = 0;

    //instead of executing query one by one,
    //let us prepare 1 SQL query that will insert all values from the batch

    $sql ="INSERT INTO workorderstest(id,parentid,f1,f2,f3,f4,f5,f6,f7,f8,f9,f10) VALUES ";

    while (($line = fgets($handle)) !== false) {
       $sql .= "($line),";
       $counter++;
    }

    $sql = substr($sql, 0, strlen($sql) - 1);

    var_dump($sql);

    if ($conn->query($sql) === TRUE) {

    } else {

    }

    fclose($handle);
}

I want to keep the memory footprint to a minimum. I'm thinking this should just be a matter of keeping track of the pointer -> repeat until lines reached -> process sql -> start at pointer -> repeat until eof.

  1. Is fgets() the best to use for this?
  2. Do I need to incorperate a callback or some such to defer the sql processing until all lines are read?
  3. I'm a bit lost on where to start as I am still learning PHP.

**** Updated Answered script below if it helps someone else ...

date_default_timezone_set('Australia/Brisbane');
$date = date('m/d/Y h:i:s a', time());
$timezone = date_default_timezone_get();
$time_start = microtime(true);

$batch_size = 500; // Lines to be read per batch
$batch = 0;
$counter = 0;
$lines = 0;

$conn = new mysqli($servername, $username, $password, $dbname);

if ($conn->connect_error) {
    die("Connection failed: " . $conn->connect_error);
}

// Remove Existing Data from table
$sql = "TRUNCATE TABLE  `workorderstest`";
$conn->query($sql);

$handle = fopen(dirname(__FILE__)."/files/workorders.csv" , "r");

//instead of executing query one by one,
//let us prepare 1 SQL query that will insert all values from the batch

$sql_prefix ="INSERT INTO workorderstest(id,parentid,f1,f2,f3,f4,f5,f6,f7,f8,f9,f10) VALUES ";
$values = "";

while (($line = fgets($handle)) !== false) {
    $values .= "($line),";
    $counter++;
    $lines++;
    if ($counter == $batch_size) {
        $values = substr($values, 0, strlen($values) - 1);
        $conn->query($sql_prefix . $values) or die($conn->error);
        $counter = 0;
        $values ="";
        $batch++;
    }
}
if ($counter > 0) { // Execute the last batch
    $values = substr($values, 0, strlen($values) - 1);
    $conn->query($sql_prefix . $values) or die($conn->error);
}

// Output results
$time_end = microtime(true);
$time = $time_end - $time_start;
echo "Importing Script running at: $date <br/>";
echo "Timezone: $timezone <br/>";
echo "<br/>";
echo "Script Summary:";
echo "Time running script: " . round($time,3) . " seconds <br/>";
echo "Memory: ".memory_get_usage() . " bytes <br/>";
echo "Records Updated: $lines <br/>";
echo "Batches run: $batch <br/>";

?>

回答1:


  1. Is fgets() the best to use for this? It's a fine way to do it. Another option is to read the entire file into an array with file(), then loop over the array with foreach().

  2. Do I need to incorperate a callback? No. Just perform the query after reading every batch of lines from the file.

  3. Where to start? When the counter reaches the batch size, perform the query. Then set the counter back to 0 and set the query string back to the initial value. Finally, at the end of the loop you'll need to perform the query with the remaining values (unless the file size was an exact multiple of the batch size, in which case there won't be anything remaining).

$batch_size = 100;
$counter = 0;

//instead of executing query one by one,
//let us prepare 1 SQL query that will insert all values from the batch

$sql_prefix ="INSERT INTO workorderstest(id,parentid,f1,f2,f3,f4,f5,f6,f7,f8,f9,f10) VALUES ";
$values = "";

while (($line = fgets($handle)) !== false) {
    $values .= "($line),";
    $counter++;
    if ($counter == $batch_size) {
        $values = substr($values, 0, strlen($values) - 1);
        $conn->query($sql_prefix . $values) or die($conn->error);
        $counter = 0;
        $values ="";
    }
}
if ($counter > 0) { // Execute the last batch
    $values = substr($values, 0, strlen($values) - 1);
    $conn->query($sql_prefix . $values) or die($conn->error);
}



回答2:


This is a bit of a reinvention of he wheel. mysql has a very fast and efficient system to load CSV data into tables. That's LOAD DATA INFILE if you have the right permissions on your user account, you can invoke the code from right inside PHP. And LOAD DATA has built in support for skipping N lines.

$path = dirname(__FILE__)."/files/workorderstest.csv";
$q = "LOAD DATA INFILE ? INTO TABLE workorderstest IGNORE ? LINES";
$stmt = $dbh->prepare($q);
$stmt->bindParam(1,"$dirname");
$stmt->bindParam(2,"$n");
$stmt->execute();

That's precious few lines of code isn't it?

Note that this code uses the IGNORE LINES keywords to skip lines in the CSV. you can also use the IGNORE keyword eg.g

LOAD DATA INFILE ? IGNORE INTO TABLE ....


来源:https://stackoverflow.com/questions/41294380/working-with-file-pointers-on-a-csv

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!