i am developing an application that gathers a list with all the files of the hard drive and also afterwards it does write files to the hard drive.
I want to ask : what is the optimum number of concurrent threads that will do this task ?
I mean how many threads should i have that read the hard drive without making the hard drive to get slow because so many threads are reading it concurrently.
Thank you !
At first, I say one!
It actually depends whether the data to read need complex computations for being elaborated. In this case, it could be convenient to instantiate more than one thread to elaborate different disk data; but this is convenient only if you have a multiple CPU on the same system.
Otherwise, more than one thread make the HDD more stressed than necessary: concurrent reads from different threads will issue seek operations for reading the file blocks(*), introducing an overhead which could slow down the system, depending on the number of files read and the size of the files.
Read the files sequentially.
(*) The OS really tries to store the same file blocks sequentially in order to speed up the read operations. Disk fragmentation happens, so non-sequential fragments requires a seek operation which required really more time respect the read operation in the same place. Try to read multiple files in parallel, will cause a bunch of seeks because single file blocks are contiguous, while multiple files blocks could be not contiguous.
Never process IO-dense operations concurrently. It's slower because the disk probe wastes a lot of time on switching between different threads/files.
What shall I do if I have a few threads within IO operations? Produce the operations concurrently, and execute them single-threaded. We have a container, like a ConcurrentQueue<T>
(or a thread-safe queue written by yourself), and there are 10 threads, will read from these files 1.txt 2.txt ... 10.txt. You put the "reading-requests" in the queue concurrently, another thread deals with all the requests(open 1.txt, get what you want, and continue with 2.txt), the disk probe will not be busy with switching between threads/files in this case.
One thread. If you are reading AND writing at the same time AND your destination is a disk different from your source, then 2 threads. I'll add that if you are doing other operations to the files (for example decompress) the decompress part can be done on a third thread.
To make some examples (I'm ignoring Junctions, Reparse Points...)
- C: to C: 1 Thread TOTAL
- C: to D: same physical disk, different partitions: 1 Thread TOTAL
- C: to D: different physical disk: 2 Thread TOTAL
I'm working on the presumption that a Disk can do ONE operation at a time, and each time it "multitasks" switching between different reads/writes it loses in speed. Mechanical disks have this problem (but technically NCQ COULD help). Solid state disks I don't know (but I know that USB sticks are VERY slow if you try to do 2 operations at a time)
I have searched how you do it... I haven't found any "specific" examples, but I have some links to Windows API where you could start:
Displaying Volume Paths: http://msdn.microsoft.com/en-us/library/cc542456%28VS.85%29.aspx
GetVolumePathName: http://msdn.microsoft.com/en-us/library/aa364996(v=VS.85).aspx
GetVolumeInformationByHandleW http://msdn.microsoft.com/en-us/library/aa964920(v=VS.85).aspx
I would say one thread is enough. The CPU might be able to run many threads, but the speed of the hard drive is many orders of magnitude below the CPU's. Even if running more threads made the requests for I/O faster (of which I'm not certain), it wouldn't make the hard drive actually read faster. It could probably even slow it down.
If it's coming off a single HDD, then you want to minimise seek times. So only use one thread for reading from and writing to disk.
As the "C#" tag implies, I am assuming you are writing a managed application to perform disk I/O.
In this case, I am guessing the number of user-level managed threads are irrelevant as they are not the one actually performing disk I/O.
As far as I know, Disk I/O requests from the user-level managed threads will be queued in the kernel level APC queue and windows I/O threads will handle them.
So, I would say the frequency of disk I/O requests to be queued in APC queue will be more relevant to your question.
I have not seen any .NET threading API that allows binding any user tasks to Windows I/O threads. However, please note that my answer is based on a relative old information in the following link Windows I/O threads vs. managed I/O threads.
If anyone knows better on the current Windows 7 thread pool model that is different from the information in the link, please kindly share the information to educate me as well.
Also, you may find the following link useful to understand the windows file I/O operations: Synchronous and Asynchronous I/O
Many of the answers refer to the amount of HDDs. Keep in mind that it also depends on the number of controllers. Sometimes two HDDs are managed by a single controller. Also: two partitions on the same HDD are not two HDDs!
来源:https://stackoverflow.com/questions/5321768/how-many-threads-for-reading-and-writing-to-the-hard-disk