问题
Our web server needs to process many compositions of large images together before sending the results to web clients. This process is performance critical because the server can receive several thousands of requests per hour.
Right now our solution loads PNG files (around 1MB each) from the HD and sends them to the video card so the composition is done on the GPU. We first tried loading our images using the PNG decoder exposed by the XNA API. We saw the performance was not too good.
To understand if the problem was loading from the HD or the decoding of the PNG, we modified that by loading the file in a memory stream, and then sending that memory stream to the .NET PNG decoder. The difference of performance using XNA or using System.Windows.Media.Imaging.PngBitmapDecoder class is not significant. We roughly get the same levels of performance.
Our benchmarks show the following performance results:
- Load images from disk: 37.76ms 1%
- Decode PNGs: 2816.97ms 77%
- Load images on Video Hardware: 196.67ms 5%
- Composition: 87.80ms 2%
- Get composition result from Video Hardware: 166.21ms 5%
- Encode to PNG: 318.13ms 9%
- Store to disk: 3.96ms 0%
- Clean up: 53.00ms 1%
Total: 3680.50ms 100%
From these results we see that the slowest parts are when decoding the PNG.
So we are wondering if there wouldn't be a PNG decoder we could use that would allow us to reduce the PNG decoding time. We also considered keeping the images uncompressed on the hard disk, but then each image would be 10MB in size instead of 1MB and since there are several tens of thousands of these images stored on the hard disk, it is not possible to store them all without compression.
EDIT: More useful information:
- The benchmark simulates loading 20 PNG images and compositing them together. This will roughly correspond to the kind of requests we will get in the production environment.
- Each image used in the composition is 1600x1600 in size.
- The solution will involve as many as 10 load balanced servers like the one we are discussing here. So extra software development effort could be worth the savings on the hardware costs.
- Caching the decoded source images is something we are considering, but each composition will most likely be done with completely different source images, so cache misses will be high and performance gain, low.
- The benchmarks were done with a crappy video card, so we can expect the PNG decoding to be even more of a performance bottleneck using a decent video card.
回答1:
There is another option. And that is, you write your own GPU-based PNG decoder. You could use OpenCL to perform this operation fairly efficiently (and perform your composition using OpenGL which can share resources with OpenCL). It is also possible to interleave transfer and decoding for maximum throughput. If this is a route you can/want to pursue I can provide more information.
Here are some resources related to GPU-based DEFLATE (and INFLATE).
- Accelerating Lossless compression with GPUs
- gpu-block-compression using CUDA on Google code.
- Floating point data-compression at 75 Gb/s on a GPU - note that this doesn't use INFLATE/DEFLATE but a novel parallel compression/decompression scheme that is more GPU-friendly.
Hope this helps!
回答2:
Have you tried the following 2 things.
1)
Multi thread it, there is several ways of doing this but one would be a "all in" method. Basicly fully spawn X amount of threads, for the full proccess.
2)
Perhaps consider having XX thread do all the CPU work, and then feed it to the GPU thread.
Your question is very well formulated for being a new user, but some information about the senario might be usefull? Are we talking about a batch job or service pictures in real time? Do the 10k pictures change?
Hardware resources
You should also take into account what hardware resources you have at your dispoal.
Normaly the 2 cheapest things are CPU power and diskspace, so if you only have 10k pictures that rarly change, then converting them all into a format that quicker to handle might be the way to go.
Multi thread trivia
Another thing to consider when doing multithreading, is that its normaly smart to make the threads in BellowNormal priority.So you dont make the entire system "lag". You have to experiment a bit with the amount of threads to use, if your luck you can get close to 100% gain in speed pr CORE but this depends alot on the hardware and the code your running.
I normaly use Environment.ProcessorCount to get the current CPU count and work from there :)
回答3:
You have mutliple options
Improve the performance of the decoding process
You could implement another faster png decoder (libpng is a standard library which might be faster) You could switch to another picture format that uses simpler/faster decodeable compression
Parallelize
Use the .NET parallel processing capabilities for decoding concurrently. Decoding is likely singlethreaded so this could help if you run on multicore machines
Store the files uncompressed but on a device that compresses
For instance a compressed folder or even a sandforce ssd. This will still compress but differently and burden other software with the decompression. I am not sure this will really help and would only try this as a last resort.
回答4:
I've written a pure C# PNG coder/decoder ( PngCs ) , you might want to give it a look. But I higly doubt it will have better speed permance [*], it's not highly optimized, it rather tries to minimize the memory usage for dealing with huge images (it encodes/decodes sequentially, line by line). But perhaps it serves you as boilerplate to plug in some better compression/decompression implementantion. As I see it, the speed bottleneck is zlib (inflater/deflater), which (contrarily to Java) is not implemented natively in C# -I used a SharpZipLib library, with pure C# managed code; this cannnot be very efficient.
I'm a little surprised, however, that in your tests decoding was so much slower than encoding. That seems strange to me, because, in most compression algorithms (perhaps in all; and surely in zlib) encoding is much more computer intensive than decoding. Are you sure about that? (For example, this speedtest which read and writes 5000x5000 RGB8 images (not very compressible, about 20MB on disk) gives me about 4.5 secs for writing and 1.5 secs for reading). Perhaps there are other factor apart from pure PNG decoding?
[*] Update: new versions (since 1.1.14) that have several optimizations; if you can use .Net 4.5, specially, it should provide better decoding speed.
来源:https://stackoverflow.com/questions/11313251/fastest-png-decoder-for-net