memory leak while processing large CSV

巧了我就是萌 提交于 2019-12-25 04:55:10

问题


I have a script that downloads a large product CSV file, processes the information therein (downloading images and resizing and preparing other data for database insertion), then creates another txt file of all the processed items. The problem is that it seems to be hemmoraging memory somewhere. I get an error 500 returned, but the log shows too much memory usage. I've unset as much as I can, and I'm using SPL iterators which are supposed to be less memory intensive, but i still can get to script to complete execution and enter all of the information. Can anyone point out something in the script that would help prevent the memory leakage?

<?php

define('IN_PHPBB', true);
define('IN_SHOP', true);
$phpbb_root_path = './../forum/';
$root_path = './../';
$phpEx = substr(strrchr(__FILE__, '.'), 1);
include($phpbb_root_path.'common.'.$phpEx);

// Start session management
$user->session_begin();
$auth->acl($user->data);
$user->setup();

set_time_limit(172800);

define('THUMBNAIL_IMAGE_MAX_WIDTH', 150);
define('THUMBNAIL_IMAGE_MAX_HEIGHT', 150);

function generate_thumb($source_image_path, $thumbnail_image_path)
{
    list($source_image_width, $source_image_height, $source_image_type) = getimagesize($source_image_path);
    switch ($source_image_type) {
        case IMAGETYPE_GIF:
            $source_gd_image = imagecreatefromgif($source_image_path);
            break;
        case IMAGETYPE_JPEG:
            $source_gd_image = imagecreatefromjpeg($source_image_path);
            break;
        case IMAGETYPE_PNG:
            $source_gd_image = imagecreatefrompng($source_image_path);
            break;
    }
    if ($source_gd_image === false) {
        return false;
    }
    $source_aspect_ratio = $source_image_width / $source_image_height;
    $thumbnail_aspect_ratio = THUMBNAIL_IMAGE_MAX_WIDTH / THUMBNAIL_IMAGE_MAX_HEIGHT;
    if ($source_image_width <= THUMBNAIL_IMAGE_MAX_WIDTH && $source_image_height <= THUMBNAIL_IMAGE_MAX_HEIGHT) {
        $thumbnail_image_width = $source_image_width;
        $thumbnail_image_height = $source_image_height;
    } elseif ($thumbnail_aspect_ratio > $source_aspect_ratio) {
        $thumbnail_image_width = (int) (THUMBNAIL_IMAGE_MAX_HEIGHT * $source_aspect_ratio);
        $thumbnail_image_height = THUMBNAIL_IMAGE_MAX_HEIGHT;
    } else {
        $thumbnail_image_width = THUMBNAIL_IMAGE_MAX_WIDTH;
        $thumbnail_image_height = (int) (THUMBNAIL_IMAGE_MAX_WIDTH / $source_aspect_ratio);
    }
    $thumbnail_gd_image = imagecreatetruecolor($thumbnail_image_width, $thumbnail_image_height);
    imagecopyresampled($thumbnail_gd_image, $source_gd_image, 0, 0, 0, 0, $thumbnail_image_width, $thumbnail_image_height, $source_image_width, $source_image_height);
    imagejpeg($thumbnail_gd_image, $thumbnail_image_path, 90);
    imagedestroy($source_gd_image);
    imagedestroy($thumbnail_gd_image);
    unset($source_image_width, $source_image_height, $source_image_type, $source_gd_image, $source_aspect_ratio, $thumbnail_aspect_ratio, $thumbnail_image_width, $thumbnail_image_height, $thumbnail_gd_image);
    return true;
}


$regex = <<<'END'
/
  (
    (?: [\x00-\x7F]               # single-byte sequences   0xxxxxxx
    |   [\xC0-\xDF][\x80-\xBF]    # double-byte sequences   110xxxxx 10xxxxxx
    |   [\xE0-\xEF][\x80-\xBF]{2} # triple-byte sequences   1110xxxx 10xxxxxx * 2
    |   [\xF0-\xF7][\x80-\xBF]{3} # quadruple-byte sequence 11110xxx 10xxxxxx * 3 
    )+                            # ...one or more times
  )
| ( [\x80-\xBF] )                 # invalid byte in range 10000000 - 10111111
| ( [\xC0-\xFF] )                 # invalid byte in range 11000000 - 11111111
/x
END;
function utf8replacer($captures) {
  if ($captures[1] != "") {
    // Valid byte sequence. Return unmodified.
    return $captures[1];
  }
  elseif ($captures[2] != "") {
    // Invalid byte of the form 10xxxxxx.
    // Encode as 11000010 10xxxxxx.
    return "\xC2".$captures[2];
  }
  else {
    // Invalid byte of the form 11xxxxxx.
    // Encode as 11000011 10xxxxxx.
    return "\xC3".chr(ord($captures[3])-64);
  }
}

/* download file from source */
function getDataCSV(){
    $thefile = 'http://feeds.cnv.com/xxxxxxxxxxxxxx/Bronze/ELD-B01.csv';
    $file = 'ELD-B01.csv';

    $fh = fopen($file, "w");
    $rows = file($thefile);
    foreach($rows as $num => $row){
        if($num != 0){
            fwrite($fh, $row);
        }
    }
    fclose($fh);

    include("DataSource.php");

    $csv = new File_CSV_DataSource;
    if ($csv->load($file)) {
        $items = $csv->getHeaders();
        $csv->getColumn($items[2]);
        if ($csv->isSymmetric()) {
            $items = $csv->connect();
        } else {
            $items = $csv->getAsymmetricRows();
        }
        $items = $csv->getrawArray();
    }
    unset($csv);
    return $items;
}

$iter = new ArrayIterator(getDataCSV());

$google_list = array();
$google_list[] = array('id', 'title', 'description', 'google_product_category', 'product_type', 'link', 'image_link', 'condition', 'availability', 'price', 'brand', 'mpn');

$sql = "TRUNCATE TABLE ".SHOP_ITEMS;
$db->sql_query($sql);

foreach($iter as $item){
    if($item[12] != ""){
        $catName = str_replace(" ", "-", str_replace("and ", "", str_replace(",", "", str_replace("&amp;", "and", str_replace("-", "", $item[12])))));
    }else{
        $catName = str_replace(" ", "-", str_replace("and ", "", str_replace(",", "", str_replace("&amp;", "and", str_replace("-", "", $item[11])))));
    }

    $sql = 'SELECT cat_id FROM '.SHOP_CATS.' WHERE cat_clean = "'.$catName.'"';
    $result = $db->sql_query($sql);
    $row = $db->sql_fetchrow($result);
    $db->sql_freeresult($result);
    $catId = $row['cat_id'];

    $img = $item[9];
    $ext = substr($img, strrpos($img, '.') + 1);
    $image = 'images/products/'.$item[0].'.'.$ext;
    file_put_contents($root_path.$image, file_get_contents($img));
    $thumb = "images/products/thumbs/".$item[0]."_thumb.".$ext;

    generate_thumb($root_path.$image, $thumb);

    $itmRow = array(
        'item_name' => str_replace("/", "", preg_replace_callback($regex, "utf8replacer", html_entity_decode(html_entity_decode($item[1], ENT_QUOTES)))),
        'item_price' => $item[2],
        'item_description' => preg_replace_callback($regex, "utf8replacer", html_entity_decode(html_entity_decode($item[4], ENT_QUOTES))),
        'item_model' => $item[0],
        'item_manufacturer' => ($item[6] == '') ? 'No Info' : $item[6],
        'item_image' => $image,
        'item_cat' => ($catId) ? $catId : 0,
        'item_number' => $item[0],
        'item_vendor_code' => "ELD",
        'item_stock' => (strtolower($item[5]) == 'in stock') ? 1 : 0,
        'item_added' => strtotime($item[8]),
        'item_upc' => ($item[13] == '') ? 'No Info' : $item[13],
        'item_url' => '',
        'item_weight' => ($item[14] == '') ? 'No Info' : $item[14],
    );

    $sql = 'INSERT INTO '.SHOP_ITEMS.' '.$db->sql_build_array('INSERT', $itmRow);
    $db->sql_query($sql);
    $itmId = $db->sql_nextid();

    if(strstr($itmRow['item_name'], "-") == FALSE){
        $seo = urlencode(str_replace(" ", "-", $itmRow['item_name'])).".html";
    }else{
        $seo = urlencode(str_replace(" ", "_", $itmRow['item_name'])).".html";
    }
    if($item[5] == "oos"){
        $stock = "Out of Stock";
    }else{
        $stock = "In Stock";
    }
    $u_product = "https://therealmsofwickedry.com/product/".$seo;

    $google_list[] = array($itmId, $itmRow['item_name'], $itmRow['item_description'], 'Mature > Erotic', $catName, $u_product, "https://therealmsofwickedry.com/".$itmRow['item_image'], "new", $stock, $itmRow['item_price'], $itmRow['item_manufacturer'], $itmRow['item_model']);
    unset($catName, $catId, $img, $ext, $image, $thumb, $itmRow, $itmId, $seo, $stock, $u_product);
}

$line = '';

foreach($google_list as $list){
    $line .= implode("\t", $list);
    $line .= "\n";
}

$google = 'google_products.txt';
$h = fopen($google, "w");
fwrite($h, $line);
fclose($h);

?>

回答1:


Tanzeel is correct in assuming the file is being read in it's entirety into memory.

Here is how you can read a file line by line.

$file_handle = fopen($file, 'r');
// You can ignore the file header line if you know the format.
$first_line  = fgetcsv($fh);

while ($single_line = fgetcsv($file_handle)) {
   print_r($single_line);
}
fclose($single_line);



回答2:


I am not sure it's memory leakage, it must be an "out of memory" exception. My guess is that your script must be dying when reading the large file. When reading through your code I found the following:

$rows = file($thefile);

This code line will read the entire "large file" into an array in memory. The first step should be ensuring that your script isn't dying due to this. You can try using fopen and fread functions in PHP to read byte chunks and write into the destination file. This should ideally take care of hogging memory resources when reading.

To diagnose if getDataCSV() is the actual culprit modify the following line:

$iter = new ArrayIterator(getDataCSV());

in your code to this:

$iter = new ArrayIterator(getDataCSV());
die('I died after getDataCSV. There is another culprit somewhere else causing the script to fail!');

If you get the die message on your browser then you should start looking at other places in your code which can kill the script.

I haven't gone thoroughly through your code but you should also ensure you follow the same process of reading chunks of the file when processing it locally. For e.g. once your file is downloaded you will be processing it to generate some data. You may use arrays and loops to achieve it but since the data to process is large you should still be processing partial chunks of the file instead of dumping it all into memory.




回答3:


Turns out that the utf8replacer() function was the cause of the issue. Thanks for the input, though :)



来源:https://stackoverflow.com/questions/14017417/memory-leak-while-processing-large-csv

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!