I\'m writing a bulk ID3 tag editor in C. ID3 tags are usually at the beginning of an mp3 encoded file, although older (version 1) tags are at the end. The app is designed
If you're not normally going to be streaming the file in and then processing it, but rather hopping around (like read the tags at the front and then jump to the end, etc.) then I would use mmap simply because your code will be cleaner and easier to maintain treating the file as a large buffer without having to actually manage the the buffering and paging yourself.
As has been mentioned, if you're processing a lot of data disk I/O is likely going to dominate your processing anyway. mmap may be faster than read, but for reasonable implementations, it's likely not THAT much faster, especially on todays hardware which has continually got faster and faster while disk drives have been stuck at 7200 and 10,000 RPM for years and years.
So, go with mmap and make your code easy and neat.
It really depends on what you're trying to do. If all you need to do is hop to a known offset and read out a small tag, read()
may be faster (mmap()
has to do some rather complex internal accounting). If you are planning on copying out all 200mb of the MP3, however, or scanning it for some tag that may appear at an unknown offset, then mmap()
is likely a faster approach.
For example, if you need to shift the entire file down a few hundred bytes in order to insert an ID3 tag, one simple approach would be to expand the file with ftruncate()
, mmap the file, then memmove()
the contents down a bit. This, however, will destroy the file if your program crashes while it's running. You could also copy the contents of the file into a new file - this is another place where mmap() really shines; you can simply mmap()
the old file, then copy all of its data into the new file with a single write()
.
In short, mmap()
is great if you're doing a large amount of IO in terms of total bytes transferred; this is because it reduces the number of copies needed, and can significantly reduce the number of kernel entries needed for reading cached data. However mmap()
requires a minimum of two trips into the kernel (three if you clean up the mapping when you're done!) and does some complex internal kernel accounting, and so the fixed overhead can be high.
read()
on the other hand involves an extra memory-to-memory copy, and can thus be inefficient for large I/O operations, but is simple, and so the fixed overhead is relatively low. In short, use mmap()
for large bulk I/O, and read()
or pread()
for one-off, small I/Os.
I don't know if standard POSIX functions reside inside what you are allowed or you will to use for the development but think about these two functions:
int ftruncate(int fildes, off_t length);
int truncate(const char *path, off_t length);
defined in unistd.h
, which can be used to truncate a file up to a specified length. In this way you could easily
I'm not sure about the performance, you should test this method, but it should load much less things inside ram while providing a senseful way of doing it.
Don't bother with mmap
unless your code is CPU bound, specifically due to lots small reads and writes. mmap
may sound nice, but it isn't the awesome why isn't everyone using this alternative it looks like.
Given that you're recursing through potentially large directory structures, your bottleneck will be directory IO and concurrency. mmap
is not going to help.
Reading the linked to question finds this answer that supports my experience: