Laravel chunk and delete

后端 未结 4 1107
太阳男子
太阳男子 2021-01-25 18:14

I have a large number of items (1M+) that i want to delete from a database, i fork a background job to take care of that, so that the user won\'t have to wait for it to finish t

相关标签:
4条回答
  • 2021-01-25 18:38

    As Kelvin Jones points out, the reason the random number of items is being deleted is that you're deleting records as you page through them.

    chunk simply uses offset & limit to "paginate" through your table. But if you delete 100 records from page 1 (IDs 1-100), then go to page 2, you're actually now skipping IDs 101-200 and jumping to 201-300.

    chunkById is a way around this

    Post::where('arch_id', $posts_archive->id)->chunkById(1000, function ($posts) {
        //go through the collection and delete every post.
        foreach($posts as $post) {
            $post->delete();
        }
    });
    

    Literally just replace the method name. Now, instead of using offset & limit to paginate, it will look at the maximum primary key (100) from the first page, then the next page will query where ID > 100. So page 2 is now correctly giving you IDs 101-200 instead of 201-300.

    0 讨论(0)
  • 2021-01-25 18:48

    There is nothing Laravel specific about the way you'd handle this. It sounds like your database server needs review or optimization if a delete query in a job is freezing the rest of the UI.

    Retrieving each model and running a delete query individually definitely isn't a good way to optimize this as you'd be executing millions of queries. You could use a while loop with a delete limit if you wish to try to limit the load per second in your application instead of optimizing your database server to handle this query:

    do {
        $deleted = Post::where('arch_id', $posts_archive->id)->limit(1000)->delete();
        sleep(2);
    } while ($deleted > 0);
    
    0 讨论(0)
  • 2021-01-25 18:48

    The reason your actual outcome is different to the expected outcome is to do with how Laravel chunks your dataset.

    Laravel paginates through your dataset 1-page at a time, and passes the Collection of Post models to your callback.

    Since you're deleting the records in the set, Laravel effectively skips a page of data on each iteration, therefore you end up missing roughly half the data that was in the original query.

    Take the following scenario – there are 24 records that you wish to delete in chunks of 10:

    Expected

    +-------------+--------------------+---------------------------+
    |  Iteration  |   Eloquent query   | Rows returned to callback |
    +-------------+--------------------+---------------------------+
    | Iteration 1 | OFFSET 0 LIMIT 10  |                        10 |
    | Iteration 2 | OFFSET 10 LIMIT 10 |                        10 |
    | Iteration 3 | OFFSET 20 LIMIT 10 |                         4 |
    +-------------+--------------------+---------------------------+
    

    Actual

    +-------------+--------------------+----------------------------+
    |  Iteration  |   Eloquent query   | Rows returned to callback  |
    +-------------+--------------------+----------------------------+
    | Iteration 1 | OFFSET 0 LIMIT 10  |                         10 | (« but these are deleted)
    | Iteration 2 | OFFSET 10 LIMIT 10 |                          4 |
    | Iteration 3 | NONE               |                       NONE |
    +-------------+--------------------+----------------------------+
    

    After the 1st iteration, there were only 14 records left, so when Laravel fetched page 2, it only found 4 records.

    The result, is that 14 records out of 24 were deleted, and this feels a bit random but makes sense in terms of how Laravel processes the data.

    Another solution to the problem would be to use a cursor to process your query, this will step through your DB result-set 1 record at a time, which is better use of memory.

    E.g.

    // laravel job class
    // ...
    public function handle()
    {
        $posts_archive = PostArchive::find(1); // just for the purpose of testing ;)
        $query = Post::where('arch_id', $posts_archive->id);
    
        foreach ($query->cursor() as $post) {
            $post->delete();
        }
    }
    

    NB: The other solutions here are better if you only want to delete the records in the DB. If you have any other processing that needs to occur, then using a cursor would be a better option.

    0 讨论(0)
  • 2021-01-25 18:49

    If i understand correctly, the issue is that deleting a large amount of entries takes too much ressources. doing it one post at a time will take too long too.

    try getting the min and the max of post.id then chunk on those like

    for($i = $minId; $i <= $maxId-1000; $i+1000) {
        Post::where('arch_id', $posts_archive->id)->whereBetween('id', [$i, $i+1000])->delete();
        sleep(2);
    }
    

    customize the chunk and the sleep period as it suites your server ressources.

    0 讨论(0)
提交回复
热议问题