问题
Just wondering if anybody has done/aware about encoding/compressing large image into JPEG2000 format using Hadoop ? There is also this http://code.google.com/p/matsu-project/ which uses map reduce to process the image.
Image size is about 1TB+ and on single machine it takes 100Hour+
回答1:
How large of an image are you talking about? From the JPEG 2000 Wikipedia page it seems that the tiling and wavelet transformations should be easily parallelizable -- the tiles appear to be independent of each other. There is an open source library called JasPer that appears to be fairly widely used, but it is written in C which will make it a bit tricky integrating into Hadoop.
You will essentially have to part the codec out and call the appropriate tiling and ecoding functions in the map step and the reassemble and write out the image in the reduce step. It will probably require a fairly deep understanding of the JPEG 2000 format itself.
The question is: how much time will you spend moving the uncompressed data around and then reassembling it compared to processing the tiles serially on a single machine? You might want to do some back of the envelope calculations to see if it is worth it and what the theoretical speedup would be compared to doing it on a single machine.
来源:https://stackoverflow.com/questions/4301065/encoding-image-into-jpeg2000-using-distributed-computing-like-hadoop