问题
I've got a basic codeset like this (inside a controller):
$sql = 'select * from someLargeTable limit 1000';
$em = $this->getDoctrine()->getManager();
$conn = $em->getConnection();
$statement = $conn->prepare($sql);
$statement->execute();
My difficulty is that when the resultset is only a few records, the memory usage is not that bad. I echoed some debugging information before and after running the $statement->execute(); part of the code, and found for my implementation that I have the following:
pre-execute... rowCount :: 0 memory: 49.614 MB
post-execute... rowCount :: 1000 memory: 50.917 MB
When moving this up from 1000 records, to 10k the difference in MB usage grows to 13 MB
pre-execute... rowCount :: 0 memory: 49.614 MB
post-execute... rowCount :: 10000 memory: 62.521 MB
Eventually, retrieving around 50k records I get close to my maximum memory allocation:
pre-execute... rowCount :: 0 memory: 49.614 MB
post-execute... rowCount :: 50000 memory: 114.096 MB
With this implementation, there is no way I could write a controller (or even command for that matter) that will allow me to retrieve a CSV of data. Sure, 50k+ entries sounds a lot and the question begs why, but that's not the issue.
My ultimate question is: Is it possible to tell the DBAL/Connection or DBAL/Statement to, when executing, buffer the data inside SQL rather than in PHP in it's entirety. For instance, if I have 10 million rows, to only send the first say 10k rows to PHP... let me look through them by way of @statement->fetch(); and when the cursor gets to the end of the 10k, truncate the array and fetch the next 10k from the DB?
回答1:
NOTE: This answer is wrong. I have tried deleting it, but it won’t disappear, because it’s the accepted one. I have flagged it for mod review, but they won’t remove it. See the other answers for better solutions. – lxg
Assuming that your query can be represented in DQL, you could use an iterated query with DQL:
$i = 0;
$batchSize = 100; // try different values, like 20 <= $batchSize <= 1000
$q = $em->createQuery('SELECT e FROM YourWhateverBundle:Entity');
$iterableResult = $q->iterate();
foreach($iterableResult as $row)
{
$entity = $row[0];
// do whatever you need with the entity object
if (($i % $batchSize) == 0)
{
$em->flush(); // if you need to update something
$em->clear(); // frees memory. BEWARE: former entity references are lost.
}
++$i;
// if you only want to process a certain number of elements,
// you can of course break the loop here.
}
http://docs.doctrine-project.org/en/2.0.x/reference/batch-processing.html
If you need to use a native query, you could work with LIMIT offsets to generate chunks of 1000 records, and try clearing the EntityManager between them. (Haven't tested this, but I would avoid native queries with Doctrine anyway.)
回答2:
I just ran into the same problem and wanted to share a possible solution. Chances are your DBAL uses PDO library and its PDO::MYSQL_ATTR_USE_BUFFERED_QUERY
set to true which means all the results in your query are cached on mysql side and buffered into memory by PDO even though you never call $statement->fetchAll()
. To fix this, we just need to set PDO::MYSQL_ATTR_USE_BUFFERED_QUERY
to false but DBAL does not give us a way to do it - its PDO connection class is protected without a public method to retrieve it and it does not give us a way to use setAttribute on the PDO connection.
So, in such situations, I just use my own PDO connection to save memory and speed things up. You can easily instantiate one with your doctrine db parameters like this:
$dbal_conn = $this->getDoctrine()->getManager()->getConnection();
$params = $dbal_conn->getParams();
$pdo_conn = new \PDO(
'mysql:dbname='.$dbal_conn->getDatabase().';unix_socket='.$params['unix_socket'],
$dbal_conn->getUsername(),
$dbal_conn->getPassword()
);
$pdo_conn->setAttribute(PDO::MYSQL_ATTR_USE_BUFFERED_QUERY, false);
I am using unix sockets but IP host addresses can also be easily used.
回答3:
The selected answer is wrong and @kroky's answer should be selected as the correct one.
The problem is Buffer vs Unbuffered Queries.
Now it won't be a good idea to change the behaviour for all queries, because:
Unless the full result set was fetched from the server no further queries can be sent over the same connection.
Hence, it should only be used when necessary. Here is a full working example with >200k objects:
$qb = ...->createQueryBuilder('p');
$this
->em
->getConnection()
->getWrappedConnection()
->setAttribute(\PDO::MYSQL_ATTR_USE_BUFFERED_QUERY, false);
$query = $qb->getQuery();
$result = $query->iterate();
$batchSize = 20;
$i = 0;
foreach ($result as $product)
{
$i++;
var_dump($product[0]->getSku());
if (($i % $batchSize) === 0) {
$this->em->flush();
$this->em->clear(); // Detaches all objects from Doctrine!
}
}
It most likely needs some refinement.
回答4:
You can disable query buffer by doctrine config param options
doctrine:
dbal:
# configure these for your database server
driver: 'pdo_mysql'
...
options:
1000: false
来源:https://stackoverflow.com/questions/25660983/symfony2-doctrine-make-statement-execute-not-buffer-all-values