I\'m writing a download manager in Objective-C which downloads file from multiple segments at the same times in order to improve the speed. Each segement of the file is download
Queue up the segment-objects as they are received to a writer-thread. The writer-thread should keep a list of out-of-order objects so that the actual disk-writing is sequential. If a segment download fails, it can be pushed back onto the downloading thread pool for another try, (perhaps an internal retry-count should be kept). I suggest a pool of segment-objects to prevent one or more failed download of one segment resulting in runaway memory use as later segments are downloaded and added to the list.
The answers given thus far have some clear disadvantages:
A thread safe, efficient, lock free approach would be to use memory mapping, which works as follows:
open()
the file for read/writemmap()
it to some place in memory. The file now "lives" in memory.munmap()
the memory and close()
the fileThe actual writing is handled by the kernel - your program will never issue a write
system call of any form. Memory mapping generally has little downsides and is used extensively for things like shared libraries.
update: a piece of code says more than 1000 words... This is the mmap
version of Mecki's lock-based multi-thread file writer. Note that writing is reduced to a simple memcpy
, which cannot fail(!!), so there is no BOOL success
to check. Performance is equivalent to the lock based version. (tested by writing 100 1mb blocks in parallel)
Regarding a comment on "overkill" of an mmap
based approach: this uses less lines of code, doesn't require locking, is less likely to block on writing, requires no checking of return values on writing. The only "overkill" would be that it requires the developer to understand another concept than good old read/write file I/O.
The possibility to read directly into the mmapped memory region is left out, but is quite simple to implement. You can just read(fd,i_filedata+offset,length);
or recv(socket,i_filedata+offset,length,flags);
directly into the file.
@interface MultiThreadFileWriterMMap : NSObject
{
@private
FILE * i_outputFile;
NSUInteger i_length;
unsigned char *i_filedata;
}
- (id)initWithOutputPath:(NSString *)aFilePath length:(NSUInteger)length;
- (void)writeBytes:(const void *)bytes ofLength:(size_t)length
toFileOffset:(off_t)offset;
- (void)writeData:(NSData *)data toFileOffset:(off_t)offset;
- (void)close;
@end
#import "MultiThreadFileWriterMMap.h"
#import <sys/mman.h>
#import <sys/types.h>
@implementation MultiThreadFileWriterMMap
- (id)initWithOutputPath:(NSString *)aFilePath length:(NSUInteger)length
{
self = [super init];
if (self) {
i_outputFile = fopen([aFilePath UTF8String], "w+");
i_length = length;
if ( i_outputFile ) {
ftruncate(fileno(i_outputFile), i_length);
i_filedata = mmap(NULL,i_length,PROT_WRITE,MAP_SHARED,fileno(i_outputFile),0);
if ( i_filedata == MAP_FAILED ) perror("mmap");
}
if ( !i_outputFile || i_filedata==MAP_FAILED ) {
[self release];
self = nil;
}
}
return self;
}
- (void)dealloc
{
[self close];
[super dealloc];
}
- (void)writeBytes:(const void *)bytes ofLength:(size_t)length
toFileOffset:(off_t)offset
{
memcpy(i_filedata+offset,bytes,length);
}
- (void)writeData:(NSData *)data toFileOffset:(off_t)offset
{
memcpy(i_filedata+offset,[data bytes],[data length]);
}
- (void)close
{
munmap(i_filedata,i_length);
i_filedata = NULL;
fclose(i_outputFile);
i_outputFile = NULL;
}
@end
Never forget, Obj-C bases on normal C and thus I would just write an own class, that handles file I/O using standard C API, which allows you to place the current write position anywhere within a new file, even far beyond the current file size (missing bytes are filled with zero bytes), as well as jumping forward and backward as you wish. The easiest way to achieve thread-safety is using a lock, this is not necessary the fastest way but in your specific case, I bet that the bottleneck is certainly not thread-synchronization. The class could have a header like this:
@interface MultiThreadFileWriter : NSObject
{
@private
FILE * i_outputFile;
NSLock * i_fileLock;
}
- (id)initWithOutputPath:(NSString *)aFilePath;
- (BOOL)writeBytes:(const void *)bytes ofLength:(size_t)length
toFileOffset:(off_t)offset;
- (BOOL)writeData:(NSData *)data toFileOffset:(off_t)offset;
- (void)close;
@end
And an implementation similar to this one:
#import "MultiThreadFileWriter.h"
@implementation MultiThreadFileWriter
- (id)initWithOutputPath:(NSString *)aFilePath
{
self = [super init];
if (self) {
i_fileLock = [[NSLock alloc] init];
i_outputFile = fopen([aFilePath UTF8String], "w");
if (!i_outputFile || !i_fileLock) {
[self release];
self = nil;
}
}
return self;
}
- (void)dealloc
{
[self close];
[i_fileLock release];
[super dealloc];
}
- (BOOL)writeBytes:(const void *)bytes ofLength:(size_t)length
toFileOffset:(off_t)offset
{
BOOL success;
[i_fileLock lock];
success = i_outputFile != NULL
&& fseeko(i_outputFile, offset, SEEK_SET) == 0
&& fwrite(bytes, length, 1, i_outputFile) == 1;
[i_fileLock unlock];
return success;
}
- (BOOL)writeData:(NSData *)data toFileOffset:(off_t)offset
{
return [self writeBytes:[data bytes] ofLength:[data length]
toFileOffset:offset
];
}
- (void)close
{
[i_fileLock lock];
if (i_outputFile) {
fclose(i_outputFile);
i_outputFile = NULL;
}
[i_fileLock unlock];
}
@end
The lock could be avoided in various way. Using Grand Central Dispatch and Blocks to schedule the seek + write operations on a Serial Queue would work. Another way would be to use UNIX (POSIX) file handlers instead of standard C ones (open()
and int
instead of FILE *
and fopen()
), duplicate the handler multiple times (dup()
function) and then placing each of them to a different file offset, which avoids further seeking operations on each write and also locking, since POSIX I/O is thread-safe. However, both implementations would be somewhat more complicating, less portable and there would be no measurable speed improvement.