Symfony2 / Doctrine make $statement->execute() not “buffer” all values

流过昼夜 提交于 2019-12-05 02:32:56

问题


I've got a basic codeset like this (inside a controller):

$sql = 'select * from someLargeTable limit 1000';
$em = $this->getDoctrine()->getManager();
$conn = $em->getConnection();
$statement = $conn->prepare($sql);
$statement->execute();

My difficulty is that when the resultset is only a few records, the memory usage is not that bad. I echoed some debugging information before and after running the $statement->execute(); part of the code, and found for my implementation that I have the following:

pre-execute... rowCount :: 0 memory: 49.614 MB
post-execute... rowCount :: 1000 memory: 50.917 MB

When moving this up from 1000 records, to 10k the difference in MB usage grows to 13 MB

pre-execute... rowCount :: 0 memory: 49.614 MB
post-execute... rowCount :: 10000 memory: 62.521 MB

Eventually, retrieving around 50k records I get close to my maximum memory allocation:

pre-execute... rowCount :: 0 memory: 49.614 MB
post-execute... rowCount :: 50000 memory: 114.096 MB

With this implementation, there is no way I could write a controller (or even command for that matter) that will allow me to retrieve a CSV of data. Sure, 50k+ entries sounds a lot and the question begs why, but that's not the issue.

My ultimate question is: Is it possible to tell the DBAL/Connection or DBAL/Statement to, when executing, buffer the data inside SQL rather than in PHP in it's entirety. For instance, if I have 10 million rows, to only send the first say 10k rows to PHP... let me look through them by way of @statement->fetch(); and when the cursor gets to the end of the 10k, truncate the array and fetch the next 10k from the DB?


回答1:


NOTE: This answer is wrong. I have tried deleting it, but it won’t disappear, because it’s the accepted one. I have flagged it for mod review, but they won’t remove it. See the other answers for better solutions. – lxg


Assuming that your query can be represented in DQL, you could use an iterated query with DQL:

$i = 0;
$batchSize = 100; // try different values, like 20 <= $batchSize <= 1000
$q = $em->createQuery('SELECT e FROM YourWhateverBundle:Entity');
$iterableResult = $q->iterate();

foreach($iterableResult as $row)
{
    $entity = $row[0];

    // do whatever you need with the entity object

    if (($i % $batchSize) == 0)
    {
        $em->flush(); // if you need to update something
        $em->clear(); // frees memory. BEWARE: former entity references are lost.
    }
    ++$i;

    // if you only want to process a certain number of elements,
    // you can of course break the loop here.
}

http://docs.doctrine-project.org/en/2.0.x/reference/batch-processing.html

If you need to use a native query, you could work with LIMIT offsets to generate chunks of 1000 records, and try clearing the EntityManager between them. (Haven't tested this, but I would avoid native queries with Doctrine anyway.)




回答2:


I just ran into the same problem and wanted to share a possible solution. Chances are your DBAL uses PDO library and its PDO::MYSQL_ATTR_USE_BUFFERED_QUERY set to true which means all the results in your query are cached on mysql side and buffered into memory by PDO even though you never call $statement->fetchAll(). To fix this, we just need to set PDO::MYSQL_ATTR_USE_BUFFERED_QUERY to false but DBAL does not give us a way to do it - its PDO connection class is protected without a public method to retrieve it and it does not give us a way to use setAttribute on the PDO connection.

So, in such situations, I just use my own PDO connection to save memory and speed things up. You can easily instantiate one with your doctrine db parameters like this:

$dbal_conn = $this->getDoctrine()->getManager()->getConnection();
$params = $dbal_conn->getParams();
$pdo_conn = new \PDO(
  'mysql:dbname='.$dbal_conn->getDatabase().';unix_socket='.$params['unix_socket'],
  $dbal_conn->getUsername(),
  $dbal_conn->getPassword()
);
$pdo_conn->setAttribute(PDO::MYSQL_ATTR_USE_BUFFERED_QUERY, false);

I am using unix sockets but IP host addresses can also be easily used.




回答3:


The selected answer is wrong and @kroky's answer should be selected as the correct one.

The problem is Buffer vs Unbuffered Queries.

Now it won't be a good idea to change the behaviour for all queries, because:

Unless the full result set was fetched from the server no further queries can be sent over the same connection.

Hence, it should only be used when necessary. Here is a full working example with >200k objects:

    $qb = ...->createQueryBuilder('p');

    $this
        ->em
        ->getConnection()
        ->getWrappedConnection()
        ->setAttribute(\PDO::MYSQL_ATTR_USE_BUFFERED_QUERY, false);

    $query = $qb->getQuery();
    $result = $query->iterate();
    $batchSize = 20;
    $i = 0;
    foreach ($result as $product)
    {
        $i++;

        var_dump($product[0]->getSku());

        if (($i % $batchSize) === 0) {
            $this->em->flush();
            $this->em->clear(); // Detaches all objects from Doctrine!
        }
    }

It most likely needs some refinement.




回答4:


You can disable query buffer by doctrine config param options

doctrine:
    dbal:
        # configure these for your database server
        driver: 'pdo_mysql'
        ...
        options:
            1000: false


来源:https://stackoverflow.com/questions/25660983/symfony2-doctrine-make-statement-execute-not-buffer-all-values

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!