哈夫曼树是一种特殊的树,结合前面做书上动态规划题的了解,哈夫曼树就是最优二叉树。
建立一颗哈夫曼树前需要明确条件,比如一颗词典树(节点值为单词),我们希望能通过我们的查找习惯建立一颗更快、更合适的二叉树,那么,这里的条件就是树中每个单词的搜索频率,显然,搜索频率越高的单词越靠近树根,查找效率会更好,通过搜索频率(权值)与节点离根节点的路径距离计算出WPL(带权路径长),当词典树的形态为某种情况的时候(哈夫曼树总是一颗满二叉树 — 除叶节点外,内部节点都是儿孙满堂的),WPL最小,那么这样的一颗二叉树就是最优二叉树,也就是我们想要的树的形态了。
可通过动态规划算法证明,上面描述的二叉树的各个节点是否与最优二叉树的各节点相等。当然书上还有更严谨的数学证明。
WPL计算很简单,公式:WPL = ∑ Li × Pi (其中L是路径长度,P是权值)。
建立哈夫曼树很简单:初始化节点数据,维护一个最小优先队列,将节点按权值大小加入到优先队列中,然后将队列中的节点弹出,由下而上建立哈夫曼树。
算法伪python代码:
'''
class node:
int f; //权值
type var; //其他数据类型
node left;
ndoe right;
'''
def build_Huffman_tree(nodes):
"""
nodes是一组node类型的节点
"""
priority_queue<node> que = nodes; //加入到优先队列
while que.size > 1:
left = que.top;
right = que.top;
p = new node; // 请求一个新节点
p.f = left.f + right.f;
que.add = p;
return que.top;
哈夫曼编码是一种变长编码的方式,变长编码一般比定长编码压缩率高,所以这里不考虑定长编码,但定长编码也很简单,自己制定一个编码表,通过查表的方式编码,效率高。解码也是查表即可。
制定哈夫曼编码规则:左路径编码为0,右路径编码为1。这样就可以通过遍历二叉树进行编码了。如图:
图片来自百度图片
解码也很简单,只需要根据制定的规则,再进行树的遍历,然后通过查表即可解码。
完整代码:
#include <iostream>
#include <fstream>
#include <string.h>
#include <string>
#define MAXSIZE 0xffff
#define QUE_LEFT(i) (2*(i) + 1)
class BaseNode {
public:
char var;
size_t freq;
BaseNode * left;
BaseNode * right;
BaseNode() {}
BaseNode(char c, size_t f) : var(c), freq(f) {}
BaseNode(BaseNode * l, BaseNode * r) : var(0), freq(l->freq + r->freq), left(l), right(r) {}
virtual ~BaseNode() {}
};
class queue : public BaseNode {
public:
size_t size_s;
BaseNode * priority_queue[MAXSIZE];
queue() : size_s(0) {}
virtual ~queue() {
while(!empty())
size_s--;
}
bool empty() const {
if (size_s == 0)
return true;
return false;
}
bool full() const {
if (size_s == MAXSIZE)
return true;
return false;
}
size_t size() const {
return size_s;
}
void insert(BaseNode * n);
BaseNode * pop();
};
void queue::insert(BaseNode * n) {
if (full())
exit(1);
int i = size_s++;
for (; i > 0 && priority_queue[i / 2]->freq >= n->freq; i /= 2)
priority_queue[i] = priority_queue[i / 2];
priority_queue[i] = n;
}
BaseNode * queue::pop() {
if (empty())
exit(1);
size_s--;
BaseNode * root = priority_queue[0];
int i = 0;
for (int l; QUE_LEFT(i) < (int)size_s; i = l) {
l = QUE_LEFT(i);
if (l + 1 < (int)size_s && priority_queue[l + 1]->freq < priority_queue[l]->freq)
l++;
priority_queue[i] = priority_queue[l];
}
priority_queue[i] = priority_queue[size_s];
return root;
}
class HuffmanTree {
public:
HuffmanTree() {}
~HuffmanTree() {}
BaseNode * build_Huffman_tree(std::string str, int * & freq);
void coding(BaseNode * root, char * write, char ** code, int len);
std::string encode(BaseNode * root, std::string str, char ** code);
void decode(BaseNode * root, std::string codes);
void destory(BaseNode *& root);
};
BaseNode * HuffmanTree::build_Huffman_tree(std::string str, int * & freq) {
queue que;
for (auto v : str)
++freq[(int)v];
for (int i = 0; i < 128 + 1; i++)
if (freq[i]) {
BaseNode * n = new BaseNode(i, freq[i]);
que.insert(n);
}
while (que.size() > 1) {
BaseNode * left = que.pop();
BaseNode * right = que.pop();
BaseNode * parent = new BaseNode(left, right);
que.insert(parent);
}
return que.pop();
}
void HuffmanTree::coding(BaseNode * pr, char * write, char ** code, int len) {
static char buf[MAXSIZE >> 1], *out = buf;
if (pr->var) {
write[len] = 0;
strcpy(out, write);
code[(int)pr->var] = out;
out += len + 1;
return;
}
write[len] = '0'; coding(pr->left, write, code, len + 1);
write[len] = '1'; coding(pr->right, write, code, len + 1);
}
std::string HuffmanTree::encode(BaseNode * root, std::string str, char ** code) {
char * write = new char;
coding(root, write, code, 0);
delete write;
std::string read;
for (auto v : str)
read += code[(int)v];
return read;
}
void HuffmanTree::decode(BaseNode * root, std::string codes) {
BaseNode * n = root;
int i = 0;
while (codes[i]) {
if (codes[i++] == '0')
n = n->left;
else
n = n->right;
if (n->var) putchar(n->var), n = root;
}
}
void TravelTree(BaseNode * root) {
if (root) {
std::cout << root->freq;
if (root->var)
std::cout << ':' << root->var;
std::cout << std::endl;
TravelTree(root->left);
TravelTree(root->right);
}
}
void HuffmanTree::destory(BaseNode *& root) {
if (root) {
destory(root->left);
destory(root->right);
delete root;
}
}
int main()
{
int freq[128 + 1], *f = freq;
char *code[MAXSIZE + 1];
memset(freq, 0, sizeof(freq));
BaseNode * root;
HuffmanTree tree;
std::string str;
std::string line;
std::ifstream r;
r.open("test.txt");
if (r.is_open()) {
while (std::getline(r, line))
str += line;
r.close();
}
root = tree.build_Huffman_tree(str, f);
str = tree.encode(root, str, code);
for (int i = 0; i < 128 + 1; i++)
if (code[i]) {
std::cout << (char)i << ':' << freq[i] <<
" --- " << code[i] << std::endl;
}
std::cout << "编码:" << str << std::endl;
std::cout << "解码:";
tree.decode(root, str);
std::cout << std::endl;
tree.destory(root);
return 0;
}
运行结果:
参考资料:
1.这个网址是在维基百科找到的,有各种语言的哈夫曼编码的实现:http://rosettacode.org/wiki/Huffman_coding
2.http://www.cnblogs.com/sench/p/7798064.html
来源:oschina
链接:https://my.oschina.net/u/4340253/blog/4043755