Unable to compress file during Huffman Encoding in Java

问题

I have implemented the Huffman Encoding Algorithm in Java using Priority Queues where I traverse the Tree from Root to Leaf and get encoding example as #=000011 based on the number of times the symbol appears in the input. Everything is fine, the tree is being built fine, encoding is just as expected: But the output file I am getting is bigger size than the original file. I am currently appending '0' & '1' to a String on traversing left node and right node of the tree. Probably what I end up with uses all 8 bits for each characters and it does not help in compression. I am guessing there is some conversion of these bits into character values which is required. So that these characters use fewer bits than 8 and hence I get a compressed version of the original file. Could you please let me know how to achieve a compression by manipulating characters and reducing bits in Java? Thanks

回答1:

You're probably using a StringBuilder and appending "0" or "1", or simply the + operator to concatenate "0" or "1" to the end of your string. Or you're using some sort of OutputStream and writing to it.

What you want to do is to write the actual bits. I'd suggest making a whole byte first before writing. A byte looks like this:

0x05

Which would represent the binary string 0000 0011.

You can make these by making a byte type, adding and shifting:

public void writeToFile(String binaryString, OutputStream os){
    int pos = 0;
    while(pos < binaryString.length()){
        byte nextByte = 0x00;
        for(int i=0;i<8 && pos+i < binaryString.length(); i++){
            nextByte << 1;
            nextByte += binaryString.charAt(pos+i)=='0'?0x0:0x1;
        }
        os.write(nextByte);
        pos+=8;
    }
}

Of course, it's inefficient to write one byte at a time, and on top of that the OutputStream interface only accepts byte arrays (byte[]). So you'd be better off storing the bytes in an array (or even easier, a List), then writing them at bigger chunks.

If you are not allowed to use byte writes (why the heck not? ObjectOutputStream supports writing byte arrays!), then you can use Base64 to encode your binary string. But remember that Base64 inflates your data usage by 33%.

An easy way to convert a byte array to base64 is by using an existing encoder. After adding the following import:

import sun.misc.BASE64Encoder;

You can instantiate the encoder and turn your byte array into a string:

byte[] bytes = getBytesFromHuffmanEncoding();
BASE64Encoder encoder = new BASE64Encoder();
String encodedString = encoder.encode(bytes);

来源：https://stackoverflow.com/questions/7801591/unable-to-compress-file-during-huffman-encoding-in-java

标签

java

encoding

huffman-code

data-compression