How to obtain good concurrent read performance from disk

后端 未结 6 903
我寻月下人不归
我寻月下人不归 2021-01-31 18:12

I\'d like to ask a question then follow it up with my own answer, but also see what answers other people have.

We have two large files which we\'d like to read from two

6条回答
  •  粉色の甜心
    2021-01-31 18:36

    The problem seems to be in Windows I/O scheduling policy. According to what I found here there are many ways for an O.S. to schedule disk requests. While Linux and others can choose between different policies, before Vista Windows was locked in a single policy: a FIFO queue, where all requests where splitted in 64 KB blocks. I believe that this policy is the cause for the problem you are experiencing: the scheduler will mix requests from the two threads, causing continuous seek between different areas of the disk.
    Now, the good news is that according to here and here, Vista introduced a smarter disk scheduler, where you can set the priority of your requests and also allocate a minimum badwidth for your process.
    The bad news is that I found no way to change disk policy or buffers size in previous versions of Windows. Also, even if raising disk I/O priority of your process will boost the performance against the other processes, you still have the problems of your threads competing against each other.
    What I can suggest is to modify your software by introducing a self-made disk access policy.
    For example, you could use a policy like this in your thread B (similar for Thread A):

    if THREAD A is reading from disk then wait for THREAD A to stop reading or wait for X ms
    Read for X ms (or Y MB)
    Stop reading and check status of thread A again  
    

    You could use semaphores for status checking or you could use perfmon counters to get the status of the actual disk queue. The values of X and/or Y could also be auto-tuned by checking the actual trasfer rates and slowly modify them, thus maximizing the throughtput when the application runs on different machines and/or O.S. You could find that cache, memory or RAID levels affect them in a way or the other, but with auto-tuning you will always get the best performance in every scenario.

提交回复
热议问题