问题
I have a requirement wherein I have to buffer a lot of data (in GBs), for future use. Since there isn't enough RAM available for buffering such huge amount of data, I decided to go for storing the data in a file.
Now the pitfall here is that while I am writing the data to the file, other threads might need that "buffered" data and so I have to flush the file stream every time I write something to it. Precisely, the data is video frames that I buffer as pre-recorded data (like a TiVo)
and other threads may or may not want to write it at any given point in time, but when they do, they fread
from the file and process the frames.
In the general case, the fwrite
-fflush
combo takes around 150 us but occasionally (and fairly regularly), the combo takes more than 1.5 seconds. I can't afford this as I have to process frames in real-time.
I have many questions here:
Is my approach of buffering data in the file correct? What alternatives do I have?
Any idea why the fwrite-fflush operation suddenly takes more time on some occasions? Note that it reverts back to 150 us after taking 1.5 seconds once.
回答1:
As for #2: Most modern file systems use a btree approach to manage the amount of directory and data nodes in todays huge HDs. As with all btrees, they need to be balanced sometimes. While that happens, no changes must happen, so that's why the system locks up. Usually, it's not a big deal because of the huge caches of the OS but you're a corner case where it hurts.
What can you do about it? There are two approaches:
Use sockets to communicate and keep the last N frames in RAM (i.e. never write them to disk or use an independent process to write it to disk).
Don't write a new file, overwrite an existing file. Since the location of all data blocks is known in advance, there will be no reorg in the FS while you write. It will also be a little bit faster. So the idea is to create a huge file or use a raw partition and then overwrite it. When you hit the end of the file, seek back to the start and repeat.
Drawbacks:
With approach #1, you can lose frames. Also, you must make absolutely sure that all clients can read and process the data fast enough or the server might block.
With #2, you must find a way to tell the readers where the current "end of file" is.
So maybe a mixed approach is best:
- Create a huge file (several GB). If one file isn't enough, create several.
- Open a socket
- Write the data to the file. If you reach the end of the file, seek to position 0 and continue writing there (like a cyclic buffer).
- Flush the data
- Send the start and amount of the new data to the readers via the socket
Consider using memory mapped files; that will make everything a bit more simple.
回答2:
Besides RAM and disk, there are not really any other options, only variations. I think the approach is sound though: you are getting really good file system performance.
The extra occasional time could well be due to the file system looking for more free space (it maintains a short list, but when exhausted, a more expensive search is needed) and allocating it into the file. If this is the cause, preallocate the file at maximum size and write into it using random i/o (fopen (fn, "r+")
) so that it does not truncate the file length.
Another technique which might help stabilize file i/o time is to write each frame buffer at a file offset which is aligned to a sector boundary. That way the file system doesn't have to handle an oddly offset write operation by first reading from the sector to preserve what won't be overwritten.
来源:https://stackoverflow.com/questions/6607231/writing-data-into-file-fflush-takes-a-lot-of-time