Our web server needs to process many compositions of large images together before sending the results to web clients. This process is performance critical because the server can
You have mutliple options
Improve the performance of the decoding process
You could implement another faster png decoder (libpng is a standard library which might be faster) You could switch to another picture format that uses simpler/faster decodeable compression
Parallelize
Use the .NET parallel processing capabilities for decoding concurrently. Decoding is likely singlethreaded so this could help if you run on multicore machines
Store the files uncompressed but on a device that compresses
For instance a compressed folder or even a sandforce ssd. This will still compress but differently and burden other software with the decompression. I am not sure this will really help and would only try this as a last resort.
Have you tried the following 2 things.
1)
Multi thread it, there is several ways of doing this but one would be a "all in" method. Basicly fully spawn X amount of threads, for the full proccess.
2)
Perhaps consider having XX thread do all the CPU work, and then feed it to the GPU thread.
Your question is very well formulated for being a new user, but some information about the senario might be usefull? Are we talking about a batch job or service pictures in real time? Do the 10k pictures change?
Hardware resources
You should also take into account what hardware resources you have at your dispoal.
Normaly the 2 cheapest things are CPU power and diskspace, so if you only have 10k pictures that rarly change, then converting them all into a format that quicker to handle might be the way to go.
Multi thread trivia
Another thing to consider when doing multithreading, is that its normaly smart to make the threads in BellowNormal priority.So you dont make the entire system "lag". You have to experiment a bit with the amount of threads to use, if your luck you can get close to 100% gain in speed pr CORE but this depends alot on the hardware and the code your running.
I normaly use Environment.ProcessorCount to get the current CPU count and work from there :)
I've written a pure C# PNG coder/decoder ( PngCs ) , you might want to give it a look. But I higly doubt it will have better speed permance [*], it's not highly optimized, it rather tries to minimize the memory usage for dealing with huge images (it encodes/decodes sequentially, line by line). But perhaps it serves you as boilerplate to plug in some better compression/decompression implementantion. As I see it, the speed bottleneck is zlib (inflater/deflater), which (contrarily to Java) is not implemented natively in C# -I used a SharpZipLib library, with pure C# managed code; this cannnot be very efficient.
I'm a little surprised, however, that in your tests decoding was so much slower than encoding. That seems strange to me, because, in most compression algorithms (perhaps in all; and surely in zlib) encoding is much more computer intensive than decoding. Are you sure about that? (For example, this speedtest which read and writes 5000x5000 RGB8 images (not very compressible, about 20MB on disk) gives me about 4.5 secs for writing and 1.5 secs for reading). Perhaps there are other factor apart from pure PNG decoding?
[*] Update: new versions (since 1.1.14) that have several optimizations; if you can use .Net 4.5, specially, it should provide better decoding speed.
There is another option. And that is, you write your own GPU-based PNG decoder. You could use OpenCL to perform this operation fairly efficiently (and perform your composition using OpenGL which can share resources with OpenCL). It is also possible to interleave transfer and decoding for maximum throughput. If this is a route you can/want to pursue I can provide more information.
Here are some resources related to GPU-based DEFLATE (and INFLATE).
Hope this helps!