问题
I'm trying to use PHP Simple HTML Dom Parser to parse some information from some sites. Does not matter what and where. But it seems, that there is some HUGE memory problem with it. I managed to cut the html code to only 6kB, but script that finds some elements and saves them to database takes even 700MB of ram and over 1GB of virtual memory! I read somewhere that I should use ->clear() to free up some memory, but seems that this is not the case.
I use str_get_html()
once and 5 times using ->find()
assigning the result to variable.
$main_html = str_get_html($main_site);
$x = $main_html->find(...);
$y = $main_html->find(...);
etc.
I tried to use for example $y->clear()
after using $y but I get an error PHP Fatal error: Call to a member function clear() on a non-object
even tho $y
does exist and if($y)
is true. Even foreach($y) echo $y->plaintext
does return plaintext
of $y
.
From htop:
PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ Command
8839 username 20 0 1068M 638M 268 R 23.0 8.0 0:08.41 php myscript.php
What is wrong?
Simple test:
echo "(MEM:".memory_get_usage()."->";
$product = $p->find('a',0)->href;
echo memory_get_usage()."->";
unset($product);
$p->clear();
unset($p);
echo memory_get_usage().")";
The result is:
(MEM:11865648->11866192->11865936)
More readable form:
11865648->
11866192-> (+544 in total)
11865936 (+288 in total)
Of course I can't use $product->clear() as it says that PHP Fatal error: Call to a member function clear() on a non-object
回答1:
Seems there are some memory problems when using str_html_get
or similar function that creates simple_html_dom
object few times without clearing and destroying the previous one. Especially when using ->find that creates array of simple_html_dom_node
objects. Even FAQ on authors site says to clear and destroy previous simple_html_dom
object before creating new one, but sometimes it can't be done without additional code and memory.
That's why I created this function, to remove all PHP Simple HTML Dom Parser traces from memory:
function clean_all(&$items,$leave = ''){
foreach($items as $id => $item){
if($leave && ((!is_array($leave) && $id == $leave) || (is_array($leave) && in_array($id,$leave)))) continue;
if($id != 'GLOBALS'){
if(is_object($item) && ((get_class($item) == 'simple_html_dom') || (get_class($item) == 'simple_html_dom_node'))){
$items[$id]->clear();
unset($items[$id]);
}else if(is_array($item)){
$first = array_shift($item);
if(is_object($first) && ((get_class($first) == 'simple_html_dom') || (get_class($first) == 'simple_html_dom_node'))){
unset($items[$id]);
}
unset($first);
}
}
}
}
Usage:
Clean ALL traces of PHP Simple HTML Dom Parser from memory: clean_all($GLOBALS);
Clean all traces of PHP Simple HTML Dom Parser from memory, except $myobj: clean_all($GLOBALS,'myobj');
Clean all traces of PHP Simple HTML Dom Parser from memory, except list of objects ($myobj1,$myobj2...): clean_all($GLOBALS,array('myobj1','myobj2'));
Hope it will help others too.
Generally I use it when I use str_to_html() two times like:
$site=file_get_contents('http://google.com');
$site_html=str_get_html($site);
foreach($site->find('a') as $a){
$site2=file_get_contents($a->href);
$site2_html=str_get_html($site2);
echo $site2->find('p',0)->plaintext;
}
clean_all($_GLOBALS);
In this example I can't $site_html->clear()
before foreach{}
, because foreach
then will fail. And because calling multiple str_get_html()
without clearing previous ones, the redundant dependencies are being broken and clearing it after all leaves memory leaks. That's why my function has to search the defined variables for simple_html_dom objects and clear them manually.
In my case I forked inside foreach and after few steps main php script used like 100MB of memory. And when forked few times, it have been increasing and increasing and finally killing my server to death. Well almost. Of course when PHP script ends, it does free up memory. But when using 8GB of memory, it took like ages to end.
回答2:
I believe you need to call clear()
on $main_html
From the docs...
Q: This script is leaking memory seriously... After it finished running, it's not cleaning up dom object properly from memory..
A: Due to PHP5 circular references memory leak, after creating DOM object, you must call $dom->clear()
to free memory if call file_get_dom()
more than once.
Example:
$html = file_get_html(...);
// do something...
$html->clear();
unset($html);
来源:https://stackoverflow.com/questions/18090212/php-simple-html-dom-parser-memory-leak-usage