哈夫曼编码

匿名 (未验证) 提交于 2019-12-03 00:21:02

In 1953, David A. Huffman published his paper "A Method for the Construction of Minimum-Redundancy Codes", and hence printed his name in the history of computer science. As a professor who gives the final exam problem on Huffman codes, I am encountering a big problem: the Huffman codes are NOT unique. For example, given a string "aaaxuaxz", we can observe that the frequencies of the characters 'a', 'x', 'u' and 'z' are 4, 2, 1 and 1, respectively. We may either encode the symbols as {'a'=0, 'x'=10, 'u'=110, 'z'=111}, or in another way as {'a'=1, 'x'=01, 'u'=001, 'z'=000}, both compress the string into 14 bits. Another set of code can be given as {'a'=0, 'x'=11, 'u'=100, 'z'=101}, but {'a'=0, 'x'=01, 'u'=011, 'z'=001} is NOT correct since "aaaxuaxz" and "aazuaxax" can both be decoded from the code 00001011001001. The students are submitting all kinds of codes, and I need a computer program to help me determine which ones are correct and which ones are not.

Input Specification:

N2N63N

c[1] f[1] c[2] f[2] ... c[N] f[N] 

c[i]f[i]c[i]M1000MN

c[i] code[i] 

c[i]icode[i]

Output Specification:

For each test case, print in each line either "Yes" if the student's submission is correct, or "No" if not.

Note: The optimal solution is not necessarily generated by Huffman algorithm. Any prefix code with code length being optimal is considered correct.


给定一段文字,如果我们统计出字母出现的频率,是可以根据哈夫曼算法给出一套编码,使得用此编码压缩原文可以得到最短的编码总长。然而哈夫曼编码并不是唯一的。例如对字符串"aaaxuaxz",容易得到字母 'a'、'x'、'u'、'z' 的出现频率对应为 4、2、1、1。我们可以设计编码 {'a'=0, 'x'=10, 'u'=110, 'z'=111},也可以用另一套 {'a'=1, 'x'=01, 'u'=001, 'z'=000},还可以用 {'a'=0, 'x'=11, 'u'=100, 'z'=101},三套编码都可以把原文压缩到 14 个字节。但是 {'a'=0, 'x'=01, 'u'=011, 'z'=001} 就不是哈夫曼编码,因为用这套编码压缩得到 00001011001001 后,解码的结果不唯一,"aaaxuaxz" 和 "aazuaxax" 都可以对应解码的结果。本题就请你判断任一套编码是否哈夫曼编码。

输入格式:

N2N63N

c[1] f[1] c[2] f[2] ... c[N] f[N] 

其中c[i]是集合{'0' - '9', 'a' - 'z', 'A' - 'Z', '_'}中的字符;f[i]c[i]M1000MN

c[i] code[i] 

其中c[i]是第i个字符;code[i]是不超过63个'0'和'1'的非空字符串。

输出格式:

对每套待检编码,如果是正确的哈夫曼编码,就在一行中输出"Yes",否则输出"No"。

注意:最优编码并不一定通过哈夫曼算法得到。任何能压缩到最优长度的前缀编码都应被判为正确。

输入样例:

7 A 1 B 1 C 1 D 3 E 3 F 6 G 6 4 A 00000 B 00001 C 0001 D 001 E 01 F 10 G 11 A 01010 B 01011 C 0100 D 011 E 10 F 11 G 00 A 000 B 001 C 010 D 011 E 100 F 101 G 110 A 00000 B 00001 C 0001 D 001 E 00 F 10 G 11 

输出样例:

Yes Yes No No 

提交结果

提交时间״̬分数题目编译器耗时用户
2018/5/24 23:50:17答案正确307-9C++ (g++)copper

测试点结果耗时内存
0答案正确240KB
1答案正确240KB
2答案正确240KB
3答案正确6220KB
4答案正确236KB
5答案正确236KB
6答案正确316KB

这道题可以说是树这块的压轴题了,无论是代码量还是思维难度都和其他题目不在一个档次。题目意思是给定一个带权的输入序列,和N个与带权输入序列元素相同的编码测试序列,若编码符合最优编码,则输出Yes,若不符合,则输出No。

这道题考察最优编码长度,实际上是在考察Huffman树与Huffman编码,出题人担心你想不到,还特地在题干开头专门介绍了David A. Huffman和他提出的"A Method for the Construction of Minimum-Redundancy Codes"(一种实现最小冗余编码结构的方法),也就是Huffman Codes(哈夫曼编码)。

对于每个给定的带权元素序列来说,其必定可以建成一棵Huffman树,尽管根据建树方法的不同,树的结构不同,但是对于这个给定的带权元素序列,其最优编码长度是固定的,即某一形式的Huffman树下的Huffman编码长度。(建立Huffman树的过程为,每次将权重最小的两个结点合成一棵二叉树,其树根结点权值为两子树权值之合。再将该二叉树当作结点进行重新合并。重复上述过程直到使用完所有结点,建立成一棵树(理论上N个结点要进行N-1次合并))但需要注意的是,Huffman编码的编码长度是最优编码长度,但是最优编码长度可以不是其Huffman编码(题目最后一句也有特地提醒,看来出题人还是比较好心的)。因此,验证输入测试序列需要靠两点:1.符合最优编码长度,2.能够无歧义解码。

那么思路就出来了。先根据输入序列建立Huffman树,并获得最优编码长度。再对提交数据进行检查:1.是否符合最优编码长度,2.是否符合无歧义解码规则(前缀码编码,数据仅存在于二叉树叶节点)

获得最优编码长度的过程,需要先创建一棵Huffman树,又需要先将带权序列建立成最小堆,再每轮弹出2次最小堆的顶点,作为二叉树的左右子树进行合并,合并完后的二叉树进行权值更新,再继续放入最小堆进行合并……直到最小堆元素全部弹出,最后弹出一整棵Huffman树。(最小堆的建立可以参见:#数据结构与算法学习笔记#PTA14:最小堆与最大堆(C/C++))。计算每一个测试序列的编码长度,与标准Huffman编码长度比较即可。最小堆每次插入和弹出都需要对全堆某个路径(根节点到叶子结点的一条路径)进行一次调整,具体情况分析详见代码注释。

检查前缀码编码的过程,需要根据输入序列的每个元素编码,模拟其在树中的路径(相当于每次创建一个元素编码所代表一条二叉树的路径,0代表左子树,1代表右子树)。模拟过程中的两种情况可以验证不满足前缀码要求(如下图):1.后创建的分支经过或超过已经被定义的叶子结点,2.后创建分支创建结束时未达到叶子结点。具体情况分析详见代码注释。


// HuffmanCodes.cpp : 定义控制台应用程序的入口点。 //  //#include <stdafx.h> #include <vector> #include <iostream> #include <string.h>  using namespace std;  //Huffman树结点类 class Node { public: 	Node() {} 	Node(char element, int weight) 		:element(element), weight(weight), left(NULL), right(NULL) {}  	char element; 	int weight; 	Node* left = NULL; 	Node* right = NULL; 	bool isleave = false; }; typedef Node* HFMTree;  //输入测试样例结点类 class Case { public: 	char element; 	char route[1000]; 	int length;  	int getlength() { 		return strlen(this->route); 	} };  void Read(int num, vector<HFMTree>& minHeap, vector<HFMTree>& inputlist); void Insert(vector<HFMTree>& minHeap, HFMTree node);		//插入数据创建最小堆 HFMTree CreateHFMT(vector<HFMTree>& minHeap);			//根据最小堆创建Huffman树 HFMTree DeleteMinHeap(vector<HFMTree>& minHeap);		//从最小堆中取出最小元素,删除该结点并重新调整最小堆,最后删除该结点 int getHFMLength(HFMTree hfmtree, int depth);						//获得该树编码长度  void Input(vector<Case>& testcase, int num); bool isOptimalLen(vector<Case>& testcase, vector<HFMTree>& inputlist, int weight);	//检查是否符合最优编码长度 bool isPrefixCode(vector<Case>& testcase);				//检查是否符合前缀码编码   int main() { 	/*根据输入序列建立Huffman树,并获得最优编码长度*/ 	int num; 	cin >> num;  	vector<HFMTree> minHeap;		//创建最小堆,用最小堆对序列进行存储 	vector<HFMTree> inputlist;		//记录输入顺序与权值大小 	HFMTree flag = new Node('-', -1); 	minHeap.push_back(flag); 	Read(num, minHeap, inputlist);  	HFMTree hfmtree;				//利用最小堆创建Huffman树 	hfmtree = CreateHFMT(minHeap); 	int optcodelength = getHFMLength(hfmtree, 0);	//通过序列创建的Huffman树获得最优编码长度   	/*对提交数据进行检查:1.是否符合最优编码长度,2.是否符合无歧义解码规则(前缀码编码,数据仅存在于二叉树叶节点)*/ 	int count; 	cin >> count;  	for (int i = 0;i < count;i++) { 		vector<Case> testcase; 		Input(testcase, num); 		bool isoptimallen = isOptimalLen(testcase, inputlist, optcodelength); 		bool isprefixcode = isPrefixCode(testcase); 		if (isoptimallen && isprefixcode) { 			cout << "Yes" << endl; 		} 		else { 			cout << "No" << endl; 		} 	}  	system("pause"); 	return 0; }  void Read(int num, vector<HFMTree>& minHeap, vector<HFMTree>& inputlist) { 	char element; 	int weight; 	for (int i = 0; i < num; i++) { 		cin >> element >> weight; 		HFMTree node = new Node(element, weight); 		inputlist.push_back(node); 		Insert(minHeap, node); 	} 	//minHeap.erase(minHeap.begin()); }  void Insert(vector<HFMTree>& minHeap, HFMTree node) { 	int index = minHeap.size(); 	minHeap.push_back(node);  	//每次插入后自底向上进行调整 	while ((*minHeap[index / 2]).weight > (*node).weight) { 		//此处不可单纯进行值交换,需要交换两个对象 		//(*minHeap[index]).element = (*minHeap[index / 2]).element; 		//(*minHeap[index]).weight = (*minHeap[index / 2]).weight; 		minHeap[index] = minHeap[index / 2]; 		index /= 2; 	} 	minHeap[index] = node; }  HFMTree CreateHFMT(vector<HFMTree>& minHeap) {  	HFMTree hfmtree = new Node(); 	int size = minHeap.size() - 1; 	//进行size-1次合并 	for (int i = 1; i < size; i++) { 		HFMTree node = new Node(); 		//每次从最小堆中取出堆顶的两个结点作为该结点的左右子结点 		node->left = DeleteMinHeap(minHeap); 		node->right = DeleteMinHeap(minHeap); 		node->weight = node->left->weight + node->right->weight; 		//将该结点作为根节点的二叉树重新加入最小堆 		Insert(minHeap, node); 	}  	//从最小堆中取出建好的Huffman树 	hfmtree = DeleteMinHeap(minHeap);  	return hfmtree; }  HFMTree DeleteMinHeap(vector<HFMTree>& minHeap) { 	//检查是否堆空 	if (minHeap.size() == 1) { 		return NULL; 	}  	//将该堆最大元素装入新结点并返回 	HFMTree node = new Node(); 	node = minHeap[1];  	//重新调整该堆 	int size = minHeap.size(); 	int parent, child; 	//用最大堆中最后一个元素从根结点开始向上过滤下层结点 	HFMTree cmp = new Node(); 	cmp = minHeap[size - 1];  	//从根节点开始,用parent记录根结点下标,用child记录其最小子结点下标,每次循环将parent更新为上一次循环的child 	//当parent指向底层结点时跳出循环(会有极端情况比如偏向一边的堆使得parent最终并非指向该子树底层结点,但不影响结果) 	for (parent = 1; 2 * parent < size; parent = child) { 		child = parent * 2; 		//若该子结点不是堆尾结点,令child指向左右子结点中的较小者 		if ((child != size - 1) && ((*minHeap[child]).weight > (*minHeap[child + 1]).weight)) { 			child++; 		} 		//当循环到堆尾结点值小于等于该子结点值时,可以结束(此时堆尾结点会替换parent结点而不是child结点) 		if (cmp->weight <= (*minHeap[child]).weight) { 			break; 		} 		else { 			minHeap[parent] = minHeap[child]; 		} 	} 	//将尾结点与当前父结点替换 	minHeap[parent] = cmp;  	//删除堆尾结点 	//此处不能用minHeap.erase(minHeap.end());,因为erase会返回被删除结点的下一结点,而尾结点的下一结点超限 	minHeap.pop_back();  	//返回该结点 	return node; }  int getHFMLength(HFMTree hfmtree, int depth) { 	//若为叶子节点,直接返回其编码长度 	if (!hfmtree->left && !hfmtree->right) { 		return hfmtree->weight * depth; 	} 	//否则其他节点一定有两个子树,返回左右子树编码长度之合,深度相应加一 	else { 		return getHFMLength(hfmtree->left, depth + 1) + getHFMLength(hfmtree->right, depth + 1); 	} }  void Input(vector<Case>& testcase, int num) { 	for (int i = 0;i < num;i++) { 		Case inputcase; 		cin >> inputcase.element >> inputcase.route; 		inputcase.length = inputcase.getlength(); 		testcase.push_back(inputcase); 	} }  bool isOptimalLen(vector<Case>& testcase, vector<HFMTree>& inputlist, int weight) { 	int testweight = 0; 	for (int i = 0;i < testcase.size();i++) { 		testweight += (testcase[i].length * (*inputlist[i]).weight); 	} 	if (testweight == weight) { 		return true; 	} 	else { 		return false; 	}  }  bool isPrefixCode(vector<Case>& testcase) { 	bool isprefixcode = true; 	HFMTree newtree = new Node();  	//两种情况验证不满足前缀码要求:1.后创建的分支经过或超过已经被定义的叶子结点,2.后创建分支创建结束时未达到叶子结点 	for (int i = 0;i < testcase.size();i++) { 		HFMTree point = newtree; 		if (isprefixcode == false)break;  		for (int j = 0;j < testcase[i].length;j++) {  			if (isprefixcode == false)break;  			if (testcase[i].route[j] == '0') { 				//先检查左子结点是否存在,若不存在,则创建一个左子结点 				if (!point->left) { 					HFMTree newnode = new Node(); 					point->left = newnode; 					point = point->left; 					//若此时为分支的最后一环,则将该结点定义为叶子结点 					if (j == testcase[i].length - 1) { 						point->isleave = true; 					} 				} 				//若左子树存在,则先将标记指针移至左子树。 				else { 					point = point->left; 					//若左子树为叶子结点,则不符合要求 					if (point->isleave) { 						isprefixcode = false; 						break; 					} 					//若此时为分支的最后一环且仍有叶子结点,则不符合要求 					if ((j == testcase[i].length - 1) && (point->left || point->right)) { 						isprefixcode = false; 						break; 					} 				} 			} 			else if (testcase[i].route[j] == '1') { 				//先检查右子结点是否存在,若不存在,则创建一个右子结点 				if (!point->right) { 					HFMTree newnode = new Node(); 					point->right = newnode; 					point = point->right; 					//若此时为分支的最后一环,则将该结点定义为叶子结点 					if (j == testcase[i].length - 1) { 						point->isleave = true; 					} 				} 				//若左子树存在,则先将标记指针移至左子树。 				else { 					point = point->right; 					//若左子树为叶子结点,则不符合要求 					if (point->isleave) { 						isprefixcode = false; 						break; 					} 					//若此时为分支的最后一环且仍有叶子结点,则不符合要求 					if ((j == testcase[i].length - 1) && (point->left || point->right)) { 						isprefixcode = false; 						break; 					} 				} 			} 		} 	}  	return isprefixcode; }


文章来源: 哈夫曼编码
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!