问题
I have a problem with my Huffman tree code. In the main method I input a String of Symbols and I also input an Integer array containing the frequency of the Symbols. It should print out each Symbol and its Huffman code, but I think its wrong...
Here is the code:
package huffman;
import java.util.*;
abstract class HuffmanTree implements Comparable<HuffmanTree> {
public final int frequency; // the frequency of this tree
public HuffmanTree(int freq) { frequency = freq; }
// compares on the frequency
public int compareTo(HuffmanTree tree) {
return frequency - tree.frequency;
}
}
class HuffmanLeaf extends HuffmanTree {
public final char value; // the character this leaf represents
public HuffmanLeaf(int freq, char val) {
super(freq);
value = val;
}
}
class HuffmanNode extends HuffmanTree {
public final HuffmanTree left, right; // subtrees
public HuffmanNode(HuffmanTree l, HuffmanTree r) {
super(l.frequency + r.frequency);
left = l;
right = r;
}
}
public class Huffman {
// input is an array of frequencies, indexed by character code
public static HuffmanTree buildTree(int[] charFreqs, char[] test2) {
PriorityQueue<HuffmanTree> trees = new PriorityQueue<HuffmanTree>();
// initially, we have a forest of leaves
// one for each non-empty character
for (int i = 0; i < charFreqs.length; i++)
if (charFreqs[i] > 0)
trees.offer(new HuffmanLeaf(charFreqs[i], test2[i]));
assert trees.size() > 0;
// loop until there is only one tree left
while (trees.size() > 1) {
// two trees with least frequency
HuffmanTree a = trees.poll();
HuffmanTree b = trees.poll();
// put into new node and re-insert into queue
trees.offer(new HuffmanNode(a, b));
}
return trees.poll();
}
public static void printCodes(HuffmanTree tree, StringBuffer prefix) {
assert tree != null;
if (tree instanceof HuffmanLeaf) {
HuffmanLeaf leaf = (HuffmanLeaf)tree;
// print out character, frequency, and code for this leaf (which is just the prefix)
System.out.println(leaf.value + "\t" + leaf.frequency + "\t" + prefix);
} else if (tree instanceof HuffmanNode) {
HuffmanNode node = (HuffmanNode)tree;
// traverse left
prefix.append('0');
printCodes(node.left, prefix);
prefix.deleteCharAt(prefix.length()-1);
// traverse right
prefix.append('1');
printCodes(node.right, prefix);
prefix.deleteCharAt(prefix.length()-1);
}
}
public static void main(String[] args) {
//Symbols:
String str = "12345678";
char[] test2 = str.toCharArray();
//Frequency (of the symbols above):
int[] charFreqs = {36,18,12,9,7,6,5,4};
// build tree
HuffmanTree tree = buildTree(charFreqs,test2);
// print out results
System.out.println("SYMBOL\tFREQ\tHUFFMAN CODE");
printCodes(tree, new StringBuffer());
}
}
The output I get is:
SYMBOL FREQ HUFFMAN CODE
1 36 0
3 12 100
6 6 1010
5 7 1011
2 18 110
4 9 1110
8 4 11110
7 5 11111
Thats weird, for example Symbol 7 should be: 11110 and Symbol 8 should be: 11111
Can you help me please?
回答1:
The assignment of the bit patterns does not matter to the optimality of the code. The assignment you have will work just fine. There is nothing weird about it. You could have also expressed concern over 2:110, 3:100, or 4:1110, 5:1011, but those are fine too.
The only reason to impose an order on the codes is to reduce the number of bits needed to convey the code from the compressor to the decompressor. Instead of sending the codes, you can send the code lengths for each symbol, so long as the code is constructed identically on both sides from just the lengths.
In that case, the approach is usually to assign the code in numerical order to a sorted list of symbols. Then you would indeed have the symbol 7 with a lower code "value" than the symbol 8, if that's the order in which they are assigned.
For your example, such a canonical code would be:
1: 1 - 0
2: 3 - 100
3: 3 - 101
4: 4 - 1100
5: 4 - 1101
6: 4 - 1110
7: 5 - 11110
8: 5 - 11111
You simply takes the lengths and within the same length, sort the symbols. Then assign codes starting with 0 and incrementing, adding bits to the end as you step up lengths.
Note that this is an unusual example, where the symbol order is also the frequency order. Normally that's not the case.
回答2:
Just You add one more 0 to understand finish bit. (over 3 bit reading)
1 36 0 3 12 100 6 6 1010 5 7 1011'0 2 18 110 4 9 1110 8 4 11110 7 5 11111'0
回答3:
To answer the question from the comments of the answer:
Hey Mark, thanks for your help, but I dont really understand how you got those codes? Do I have to change a lot in the code?<
Mark simply refers to the goal of the huffman coding to find the most efficient depth (number of bits) for each symbol so that the overall encoding of all symbols (frequency[symbol] * codeLenght[symbol] for all symbols) is minimized.
So actually all you need to do is to determine the depth (level) of the leaf for each symbol. Now sort this by the depth of each symbol and start to count them up.
Now you only need to count the pattern up. Simple as that:
Example:
2x2: 00, 01 (next is 10)
4x3: 10 + (00, 01, 10) = 1000, 1001, 1010 (next is 1011)
5x3: 1011 + (0, 1, 0 + 10) = 10110, 10111, 10110 + 10 = 11000 (next would be 11001)...
The last part shows what happens if the number of elements is bigger than the available bit difference between both groups. It just gets added to the prefix.
This way a Huffman code is created that uses the least amount of space. Since this is only one tree you can also start with 11111 and remove 1 and get another code system that is equally efficient in terms of number of bits.
One thing that can be added is that there are modification that increase the number of 1 (or 0) in comparism to 0 (or 1) so you have another chance of compressing those bit pattern used to compress a message even further.
Summary: Sort the symbols by its depth within the frequence tree. Construct the code by adding (substracting) by one. The next free code is the start prefix for the next group. For each new member just add (substract) up. And you start with filling 0 (1) within the code.
In order to store such a tree remember by using the same algorithm for the same symbole groups you need to store only the following informations:
Number of groups n, n * { +bitDepth, number of symbols i, s1,s2...si}. Since the storing of n, +bitDepth, number of symbols itself is a subject of compression you can either use a variable bit format or even issue a huffman tree since you will find stastically unevenly distributions which are the main reason you can see compression taking place with huffman trees.
回答4:
Posting completed implementation of huffman tree in Java that I created based on Princeton.EDU version. Version from Princeton package is OK as an academical example but not really usabled, outside of the context.
My example below could be used by providing a string as an input and getting a byte array as output.
If you want to encode output into a human readable format then either use standard java Base64 encoder (very inefficient in terms of space) or check HumanByte class that I posted here
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.util.PriorityQueue;
/**************************************************************************************************************
* Used edu.princeton.cs version of huffman algorithm and made it easily usable outside of example context
* replaced proprietary priority queue by Java standard one
* created binary output/input that has minimal features needed
*
* If human readable encoding of huffman encoded string is desired then look at HumanByte class
* that was posted here
*
* https://stackoverflow.com/questions/4141317/how-to-convert-byte-array-into-human-readable-format/58332208#58332208
*
* @author Stan Sokolov
* 10/9/19
**************************************************************************************************************/
public class BetterHuffman {
private static final int R = 256;
private static final char PARENT = '\u0000';
private static final char EMPTY = '\u0001';
private static final int UNDEFINED = -1;
/**************************************************************************************************************
* Compress string in bytes
**************************************************************************************************************/
public static byte[] compress(final String s) {
final HuffmanOut binaryOut = new HuffmanOut();
final char[] input = s.toCharArray();
final int[] freq = new int[R];
for (char anInput1 : input) {
++freq[anInput1];
}
final BetterHuffman.Node root = buildTrie(freq);
final String[] st = new String[R];
buildCode(st, root, "");
writeTrie(root, binaryOut);
binaryOut.write(input.length);
for (char anInput : input) {
final String code = st[anInput];
for (char ch : code.toCharArray()) {
binaryOut.writeBit(ch == '1');
}
}
return binaryOut.value();
}
/**************************************************************************************************************
* build huffman tree
**************************************************************************************************************/
private static BetterHuffman.Node buildTrie(int[] freq) {
final PriorityQueue<BetterHuffman.Node> pq2 = new PriorityQueue<>();
for (char i = 0; i < R; ++i) {
if (freq[i] > 0) {
//pq.insert(new BetterHuffman.Node(i, freq[i], null, null));
pq2.add(new BetterHuffman.Node(i, freq[i], null, null));
}
}
if (pq2.size() == 1) {//if entire string is just one char repeated
if (freq[0] == 0) {//empty string
pq2.add(new BetterHuffman.Node(PARENT, 0, null, null));
} else {
pq2.add(new BetterHuffman.Node(EMPTY, 0, null, null));
}
} else
while (pq2.size() > 1) {
final BetterHuffman.Node left = pq2.poll();
final BetterHuffman.Node right = pq2.poll();
//aggregate two nodes into one by summing frequency
final BetterHuffman.Node parent = new BetterHuffman.Node(PARENT, left.freq + right.freq, left, right);
pq2.add(parent);
}
//this will be the root node that would have total length of input as frequency
return pq2.poll();
}
/**************************************************************************************************************
* write tree into byte output
**************************************************************************************************************/
private static void writeTrie(final BetterHuffman.Node x, final HuffmanOut binaryOut) {
if (x.isLeaf()) {//if this is a node representing symbol in alphabet
binaryOut.writeBit(true);
binaryOut.writeByte((int) x.ch);
} else {
binaryOut.writeBit(false); //this is an aggregate node used for branching
writeTrie(x.left, binaryOut);
writeTrie(x.right, binaryOut);
}
}
/**************************************************************************************************************
* make substitutes for incoming words
**************************************************************************************************************/
private static void buildCode(final String[] st, final BetterHuffman.Node x, final String s) {
if (!x.isLeaf()) {
buildCode(st, x.left, s + '0');
buildCode(st, x.right, s + '1');
} else {
st[x.ch] = s;
}
}
/**************************************************************************************************************
* Return uncompressed string
**************************************************************************************************************/
public static String expand(final byte[] value) {
final HuffmanIn binaryIn = new HuffmanIn(value);
final StringBuilder out = new StringBuilder();
final BetterHuffman.Node root = readTrie(binaryIn);
final int length = binaryIn.readInt();
for (int i = 0; i < length; ++i) {
BetterHuffman.Node x = root;
while (!x.isLeaf()) {
boolean bit = binaryIn.readBoolean();
if (bit) {
x = x.right;
} else {
x = x.left;
}
}
out.append(x.ch);
}
return out.toString();
}
/**************************************************************************************************************
* get tree from bytes
**************************************************************************************************************/
private static BetterHuffman.Node readTrie(final HuffmanIn binaryIn) {
boolean isLeaf = binaryIn.readBoolean();
if (isLeaf) {
char ch = binaryIn.readChar();
return new BetterHuffman.Node(ch, UNDEFINED, null, null);
} else {
return new BetterHuffman.Node(PARENT, UNDEFINED, readTrie(binaryIn), readTrie(binaryIn));
}
}
/**************************************************************************************************************
* Simple implementation of node
**************************************************************************************************************/
private static class Node implements Comparable<Node> {
private final char ch;
private final int freq;
private final Node left;
private final Node right;
Node(char ch, int freq, Node left, Node right) {
this.ch = ch;
this.freq = freq;
this.left = left;
this.right = right;
}
private boolean isLeaf() {
return left == null && right == null;
}
@Override
public int compareTo(Node that) {
return this.freq - that.freq;
}
}
/**************************************************************************************************************
* class to read bits from stream
**************************************************************************************************************/
private static class HuffmanIn {
private final ByteArrayInputStream in;
private int buffer;
private byte n;
HuffmanIn(final byte[] input) {
in = new ByteArrayInputStream(input);
fillBuffer();
}
private void fillBuffer() {
buffer = in.read();
n = 8;
}
boolean readBoolean() {
boolean bit = (buffer >> --n & 1) == 1;
if (n == 0) {
fillBuffer();
}
return bit;
}
char readChar() {
int x = buffer <<= 8 - n;
if (n == 8) {
fillBuffer();
} else {
byte oldN = n;
fillBuffer();
n = oldN;
x |= buffer >>> n;
}
return (char) (x & 255);
}
int readInt() {
int x = 0;
for (int i = 0; i < 4; ++i) {
char c = readChar();
x <<= 8;
x |= c;
}
return x;
}
}
/**************************************************************************************************************
* Output
**************************************************************************************************************/
private static class HuffmanOut {
private ByteArrayOutputStream out = new ByteArrayOutputStream();
private int buffer;
private byte n;
/**************************************************************************************************************
* @return what was compressed so far in a human readable (no funny characters) format
**************************************************************************************************************/
public byte[] value() {
clearBuffer();
return out.toByteArray();
}
void writeBit(final boolean bit) {
buffer = (buffer <<= 1) | (bit ? 1 : 0);
if (++n == 8) {
clearBuffer();
}
}
void writeByte(final int x) {
for (int i = 0; i < 8; ++i) {
writeBit((x >>> 8 - i - 1 & 1) == 1);
}
}
void clearBuffer() {
if (n != 0) {
out.write(buffer <<= 8 - n);
n = 0;
buffer = 0;
}
}
/**************************************************************************************************************
* write all 4 bytes of int
**************************************************************************************************************/
void write(final int x) {
for (int i = 3; i >= 0; i--)
writeByte(x >>> (i * 8) & 255);//write 4 bytes of int
}
}
}
来源:https://stackoverflow.com/questions/16658419/huffman-tree-in-java