I would like to search for a given string in multiple files in parallel using CUDA. I have planned to use pfac library to search for the given string. The problem with this is h
Yes, it's probably possible to get a speed-up using CUDA if you can reduce the impact of read latency/bandwidth. One way would be by performing multiple searches concurrently. I.e. If you can search for [needle1], .. [needle1000] in your large haystack then each thread could search haystack-pieces and store the hits. Some analysis of the throughput required per-comparisons is required to determine whether your search is likely to be improved by employing CUDA. This may be useful http://dl.acm.org/citation.cfm?id=1855600
Doing your task in CUDA will not help much over doing the same thing in CPU.
Assuming that your files are stored on a standard, magnetic HDD, the typical single-threaded CPU program would consume:
That is 15.1ms for a single file. If you have 1000 files, it will take 15.1s to do the work.
Now, if I give you super-powerful GPU with infinite memory bandwith, no latency, and infinite processor speed, you will be able to perform the task (3) with no time. However, HDD reads will still consume exactly the same time. GPU cannot parallelise the work of another, independent device. As a result, instead of spending 15.1s, you will now do it in 15.0s.
The infinite GPU would give you a 0.6% speedup. A real GPU would be not even close to that!
In more general case: If you consider using CUDA, ask yourself: is the actual computation the bottleneck of the problem?
If you deal with thousants of tiny files and you need to perform reads often, consider techniques that can "attack" your bottleneck. Some may include:
there may be more options, I am not an expert in that area.