Bear with me, this is going to take some explaining. I have a function that looks like the one below.
Context: \"aProject\" is a Core Data entity named LPProject with an
I believe Ryan is on the right path: there are simply too many threads being spawned when a project has 1,500 files (the amount I decided to test with.)
So, I refactored the code above to work like this:
- (void) establishImportLinksForFilesInProject:(LPProject *)aProject
{
dispatch_queue_t taskQ = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);
dispatch_async(taskQ,
^{
// Create a new Core Data Context on this thread using the same persistent data store
// as the main thread. Pass the objectID of aProject to access the managedObject
// for that project on this thread's context:
NSManagedObjectID *projectID = [aProject objectID];
for (LPFile *fileToCheck in [backgroundContext objectWithID:projectID] memberFiles])
{
if (//Some condition is met)
{
// Here, we do the scanning for @import statements.
// When we find a valid one, we put the whole path to the
// imported file into an array called 'verifiedImports'.
// Pass this ID to main thread in dispatch call below to access the same
// file in the main thread's context
NSManagedObjectID *fileID = [fileToCheck objectID];
// go back to the main thread and update the model
// (Core Data is not thread-safe.)
dispatch_async(dispatch_get_main_queue(),
^{
for (NSString *import in verifiedImports)
{
LPFile *targetFile = [mainContext objectWithID:fileID];
// Add the relationship to targetFile.
}
});//end block
}
}
// Easy way to tell when we're done processing all files.
// Could add a dispatch_async(main_queue) call here to do something like UI updates, etc
});//end block
}
So, basically, we're now spawning one thread that reads all the files instead of one-thread-per-file. Also, it turns out that calling dispatch_async() on the main_queue is the correct approach: the worker thread will dispatch that block to the main thread and NOT wait for it to return before proceeding to scan the next file.
This implementation essentially sets up a "serial" queue as Ryan suggested (the for loop is the serial part of it), but with one advantage: when the for loop ends, we're done processing all the files and we can just stick a dispatch_async(main_queue) block there to do whatever we want. It's a very nice way to tell when the concurrent processing task is finished and that didn't exist in my old version.
The disadvantage here is that it's a bit more complicated to work with Core Data on multiple threads. But this approach seems to be bulletproof for projects with 5,000 files (which is the highest I've tested.)
This is a common issue related to disk I/O and GCD. Basically, GCD is probably spawning one thread for each file, and at a certain point you've got too many threads for the system to service in a reasonable amount of time.
Every time you call dispatch_async() and in that block you attempt to to any I/O (for example, it looks like you're reading some files here), it's likely that the thread in which that block of code is executing will block (get paused by the OS) while it waits for the data to be read from the filesystem. The way GCD works is such that when it sees that one of its worker threads is blocked on I/O and you're still asking it to do more work concurrently, it'll just spawn a new worker thread. Thus if you try to open 50 files on a concurrent queue, it's likely that you'll end up causing GCD to spawn ~50 threads.
This is too many threads for the system to meaningfully service, and you end up starving your main thread for CPU.
The way to fix this is to use a serial queue instead of a concurrent queue to do your file-based operations. It's easy to do. You'll want to create a serial queue and store it as an ivar in your object so you don't end up creating multiple serial queues. So remove this call:
dispatch_queue_t taskQ = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);
Add this in your init method:
taskQ = dispatch_queue_create("com.yourcompany.yourMeaningfulLabel", DISPATCH_QUEUE_SERIAL);
Add this in your dealloc method:
dispatch_release(taskQ);
And add this as an ivar in your class declaration:
dispatch_queue_t taskQ;
I think it is more easy to understand with diagram:
For the situation the author described:
|taskQ| ***********start|
|dispatch_1 ***********|---------
|dispatch_2 *************|---------
.
|dispatch_n ***************************|----------
|main queue(sync)|**start to dispatch to main|
*************************|--dispatch_1--|--dispatch_2--|--dispatch3--|*****************************|--dispatch_n|,
which make the sync main queue so busy that finally fail the task.