I have a large number of items (1M+) that i want to delete from a database, i fork a background job to take care of that, so that the user won\'t have to wait for it to finish t
As Kelvin Jones points out, the reason the random number of items is being deleted is that you're deleting records as you page through them.
chunk
simply uses offset & limit to "paginate" through your table. But if you delete 100 records from page 1 (IDs 1-100), then go to page 2, you're actually now skipping IDs 101-200 and jumping to 201-300.
chunkById
is a way around this
Post::where('arch_id', $posts_archive->id)->chunkById(1000, function ($posts) {
//go through the collection and delete every post.
foreach($posts as $post) {
$post->delete();
}
});
Literally just replace the method name. Now, instead of using offset & limit to paginate, it will look at the maximum primary key (100) from the first page, then the next page will query where ID > 100
. So page 2 is now correctly giving you IDs 101-200 instead of 201-300.
There is nothing Laravel specific about the way you'd handle this. It sounds like your database server needs review or optimization if a delete query in a job is freezing the rest of the UI.
Retrieving each model and running a delete query individually definitely isn't a good way to optimize this as you'd be executing millions of queries. You could use a while loop with a delete limit if you wish to try to limit the load per second in your application instead of optimizing your database server to handle this query:
do {
$deleted = Post::where('arch_id', $posts_archive->id)->limit(1000)->delete();
sleep(2);
} while ($deleted > 0);
The reason your actual outcome is different to the expected outcome is to do with how Laravel chunks your dataset.
Laravel paginates through your dataset 1-page at a time, and passes the Collection of Post
models to your callback.
Since you're deleting the records in the set, Laravel effectively skips a page of data on each iteration, therefore you end up missing roughly half the data that was in the original query.
Take the following scenario – there are 24 records that you wish to delete in chunks of 10:
Expected
+-------------+--------------------+---------------------------+ | Iteration | Eloquent query | Rows returned to callback | +-------------+--------------------+---------------------------+ | Iteration 1 | OFFSET 0 LIMIT 10 | 10 | | Iteration 2 | OFFSET 10 LIMIT 10 | 10 | | Iteration 3 | OFFSET 20 LIMIT 10 | 4 | +-------------+--------------------+---------------------------+
Actual
+-------------+--------------------+----------------------------+ | Iteration | Eloquent query | Rows returned to callback | +-------------+--------------------+----------------------------+ | Iteration 1 | OFFSET 0 LIMIT 10 | 10 | (« but these are deleted) | Iteration 2 | OFFSET 10 LIMIT 10 | 4 | | Iteration 3 | NONE | NONE | +-------------+--------------------+----------------------------+
After the 1st iteration, there were only 14 records left, so when Laravel fetched page 2, it only found 4 records.
The result, is that 14 records out of 24 were deleted, and this feels a bit random but makes sense in terms of how Laravel processes the data.
Another solution to the problem would be to use a cursor to process your query, this will step through your DB result-set 1 record at a time, which is better use of memory.
E.g.
// laravel job class
// ...
public function handle()
{
$posts_archive = PostArchive::find(1); // just for the purpose of testing ;)
$query = Post::where('arch_id', $posts_archive->id);
foreach ($query->cursor() as $post) {
$post->delete();
}
}
NB: The other solutions here are better if you only want to delete the records in the DB. If you have any other processing that needs to occur, then using a cursor would be a better option.
If i understand correctly, the issue is that deleting a large amount of entries takes too much ressources. doing it one post at a time will take too long too.
try getting the min and the max of post.id then chunk on those like
for($i = $minId; $i <= $maxId-1000; $i+1000) {
Post::where('arch_id', $posts_archive->id)->whereBetween('id', [$i, $i+1000])->delete();
sleep(2);
}
customize the chunk and the sleep period as it suites your server ressources.