How can I create a tree for Huffman encoding and decoding?

前端 未结 5 1897
轮回少年
轮回少年 2021-02-05 21:04

For my assignment, I am to do a encode and decode for huffman trees. I have a problem creating my tree, and I am stuck.

Don\'t mind the print statements - they are just

相关标签:
5条回答
  • 2021-02-05 21:15

    I was working out this problem today, to try and match results in above response. For most part, this solution works well but only thing i find non-intuitive is to add [0] and [1] in printing the non-node (leaf). But this answers miracles question - you can essentially print it using any traversal mechanism

    import queue
    
    class HuffmanNode(object):
        def __init__(self,left=None,right=None,root=None):
            self.left = left
            self.right = right
            self.root = root
        def children(self):
            return (self.left,self.right)
        def preorder(self,path=None):
            if path is None:
                path = []
            if self.left is not None:
                if isinstance(self.left[1], HuffmanNode):
                    self.left[1].preorder(path+[0])
                else:
                    print(self.left,path+[0])
            if self.right is not None:
                if isinstance(self.right[1], HuffmanNode):
                    self.right[1].preorder(path+[1])
                else:
                    print(self.right,path+[1])
    
    freq = [
        (8.167, 'a'), (1.492, 'b'), (2.782, 'c'), (4.253, 'd'),
        (12.702, 'e'),(2.228, 'f'), (2.015, 'g'), (6.094, 'h'),
        (6.966, 'i'), (0.153, 'j'), (0.747, 'k'), (4.025, 'l'),
        (2.406, 'm'), (6.749, 'n'), (7.507, 'o'), (1.929, 'p'), 
        (0.095, 'q'), (5.987, 'r'), (6.327, 's'), (9.056, 't'), 
        (2.758, 'u'), (1.037, 'v'), (2.365, 'w'), (0.150, 'x'),
        (1.974, 'y'), (0.074, 'z') ]
    
    def encode(frequencies):
        p = queue.PriorityQueue()
        for item in frequencies:
            p.put(item)
    
        #invariant that order is ascending in the priority queue
        #p.size() gives list of elements
        while p.qsize() > 1:
            left,right = p.get(),p.get()
            node = HuffmanNode(left,right)
            p.put((left[0]+right[0],node))
        return p.get()
    
    node = encode(freq)
    print(node[1].preorder())
    
    0 讨论(0)
  • 2021-02-05 21:18

    @Dave walk_tree is missing tree processing code

    # Recursively walk the tree down to the leaves,
    # assigning a code value to each symbol
    def walk_tree(node, prefix="", code={}):
        if isinstance(node[1].left[1], HuffmanNode):
            walk_tree(node[1].left,prefix+"0", code)
        else:
            code[node[1].left[1]]=prefix+"0"
        if isinstance(node[1].right[1],HuffmanNode):
            walk_tree(node[1].right,prefix+"1", code)
        else:
            code[node[1].right[1]]=prefix+"1"
        return(code)
    
    0 讨论(0)
  • 2021-02-05 21:21

    @Dave class HuffmanNode(object) has a subtle bug. When the two frequencies are equal an exception is thrown: For example let

        freq = [ (200/3101, 'd'), (100/3101, 'e'), (100/3101, 'f') ]
    

    Then you get TypeError: unorderable types: HuffmanNode() < str(). The problem has to do with the PriorityQueue implementation. I suspect when the first elements of the tuples compare equal, PriorityQueue wants to compare the second elements one of which is a python object. You add the lt method to your class and the problem is solved.

        def __lt__(self,other):
            return 0
    
    0 讨论(0)
  • 2021-02-05 21:24

    The Huffman algorithm in Wikipedia tells you exactly how to create the node tree, so your program can be based on that algorithm, or another like it. Here is a Python program with comments showing the corresponding wikipedia algorithm step. The test data is frequencies of the letters of the alphabet in English text.

    Once the node tree is created, you need to walk it down to assign Huffman codes to each symbol in your dataset. Since this is homework, that step is up to you, but a recursive algorithm is the simplest and most natural way to handle it. It's only six more lines of code.

    import queue
    
    class HuffmanNode(object):
        def __init__(self, left=None, right=None, root=None):
            self.left = left
            self.right = right
            self.root = root     # Why?  Not needed for anything.
        def children(self):
            return((self.left, self.right))
    
    freq = [
        (8.167, 'a'), (1.492, 'b'), (2.782, 'c'), (4.253, 'd'),
        (12.702, 'e'),(2.228, 'f'), (2.015, 'g'), (6.094, 'h'),
        (6.966, 'i'), (0.153, 'j'), (0.747, 'k'), (4.025, 'l'),
        (2.406, 'm'), (6.749, 'n'), (7.507, 'o'), (1.929, 'p'), 
        (0.095, 'q'), (5.987, 'r'), (6.327, 's'), (9.056, 't'), 
        (2.758, 'u'), (1.037, 'v'), (2.365, 'w'), (0.150, 'x'),
        (1.974, 'y'), (0.074, 'z') ]
    
    def create_tree(frequencies):
        p = queue.PriorityQueue()
        for value in frequencies:    # 1. Create a leaf node for each symbol
            p.put(value)             #    and add it to the priority queue
        while p.qsize() > 1:         # 2. While there is more than one node
            l, r = p.get(), p.get()  # 2a. remove two highest nodes
            node = HuffmanNode(l, r) # 2b. create internal node with children
            p.put((l[0]+r[0], node)) # 2c. add new node to queue      
        return p.get()               # 3. tree is complete - return root node
    
    node = create_tree(freq)
    print(node)
    
    # Recursively walk the tree down to the leaves,
    #   assigning a code value to each symbol
    def walk_tree(node, prefix="", code={}):
        return(code)
    
    code = walk_tree(node)
    for i in sorted(freq, reverse=True):
        print(i[1], '{:6.2f}'.format(i[0]), code[i[1]])
    

    When run on the alphabet data, the resulting Huffman codes are:

    e  12.70 100
    t   9.06 000
    a   8.17 1110
    o   7.51 1101
    i   6.97 1011
    n   6.75 1010
    s   6.33 0111
    h   6.09 0110
    r   5.99 0101
    d   4.25 11111
    l   4.03 11110
    c   2.78 01001
    u   2.76 01000
    m   2.41 00111
    w   2.37 00110
    f   2.23 00100
    g   2.02 110011
    y   1.97 110010
    p   1.93 110001
    b   1.49 110000
    v   1.04 001010
    k   0.75 0010111
    j   0.15 001011011
    x   0.15 001011010
    q   0.10 001011001
    z   0.07 001011000
    
    0 讨论(0)
  • 2021-02-05 21:27

    One more solution returning a dictionary {label:code} and a recursive dictionary tree containing the resulting graph. The input vals is in form of dictionary {label:freq}:

    def assign_code(nodes, label, result, prefix = ''):    
        childs = nodes[label]     
        tree = {}
        if len(childs) == 2:
            tree['0'] = assign_code(nodes, childs[0], result, prefix+'0')
            tree['1'] = assign_code(nodes, childs[1], result, prefix+'1')     
            return tree
        else:
            result[label] = prefix
            return label
    
    def Huffman_code(_vals):    
        vals = _vals.copy()
        nodes = {}
        for n in vals.keys(): # leafs initialization
            nodes[n] = []
    
        while len(vals) > 1: # binary tree creation
            s_vals = sorted(vals.items(), key=lambda x:x[1]) 
            a1 = s_vals[0][0]
            a2 = s_vals[1][0]
            vals[a1+a2] = vals.pop(a1) + vals.pop(a2)
            nodes[a1+a2] = [a1, a2]        
        code = {}
        root = a1+a2
        tree = {}
        tree = assign_code(nodes, root, code)   # assignment of the code for the given binary tree      
        return code, tree
    

    This can be used as:

    freq = [
    (8.167, 'a'), (1.492, 'b'), (2.782, 'c'), (4.253, 'd'),
    (12.702, 'e'),(2.228, 'f'), (2.015, 'g'), (6.094, 'h'),
    (6.966, 'i'), (0.153, 'j'), (0.747, 'k'), (4.025, 'l'),
    (2.406, 'm'), (6.749, 'n'), (7.507, 'o'), (1.929, 'p'), 
    (0.095, 'q'), (5.987, 'r'), (6.327, 's'), (9.056, 't'), 
    (2.758, 'u'), (1.037, 'v'), (2.365, 'w'), (0.150, 'x'),
    (1.974, 'y'), (0.074, 'z') ]    
    vals = {l:v for (v,l) in freq}
    code, tree = Huffman_code(vals)
    
    text = 'hello' # text to encode
    encoded = ''.join([code[t] for t in text])
    print('Encoded text:',encoded)
    
    decoded = []
    i = 0
    while i < len(encoded): # decoding using the binary graph
        ch = encoded[i]  
        act = tree[ch]
        while not isinstance(act, str):
            i += 1
            ch = encoded[i]  
            act = act[ch]        
        decoded.append(act)          
        i += 1
    
    print('Decoded text:',''.join(decoded))
    

    One can visualize the tree with Graphviz as:

    The figure was generated by the following script as (Graphviz is needed):

    def draw_tree(tree, prefix = ''):    
        if isinstance(tree, str):            
            descr = 'N%s [label="%s:%s", fontcolor=blue, fontsize=16, width=2, shape=box];\n'%(prefix, tree, prefix)
        else: # Node description
            descr = 'N%s [label="%s"];\n'%(prefix, prefix)
            for child in tree.keys():
                descr += draw_tree(tree[child], prefix = prefix+child)
                descr += 'N%s -> N%s;\n'%(prefix,prefix+child)
        return descr
    
    import subprocess
    with open('graph.dot','w') as f:
        f.write('digraph G {\n')
        f.write(draw_tree(tree))
        f.write('}') 
    subprocess.call('dot -Tpng graph.dot -o graph.png', shell=True)
    
    0 讨论(0)
提交回复
热议问题