Huffman Encoding: Assignment

In this assignment, you will implement Huffman coding, a lossless data compression algorithm that assigns variable-length codes to input characters based on their frequencies.

Learning Objectives

In this assignment you will:

Huffman Algorithm: Step-by-Step Description

See the detailed explanation here:

Huffman Coding: Algorithm Guide

Assignment Requirements

Part 1: Huffman Tree Construction

  1. Use the provided HuffmanNode class
  2. Complete the build_huffman_tree function that:
    • Creates a leaf node for each character
    • Uses a priority queue to build the tree
    • Returns the root of the Huffman tree

Part 2: Code Generation

  1. Complete the generate_codes function to traverse the Huffman tree and create codes
  2. Store the codes in a dictionary mapping characters to their binary codes

Part 4: Encoding and Decoding

  1. Implement the encode function to convert text to Huffman codes
  2. Implement the decode function to convert encoded bits back to text

Testing

The provided code will display the following to help you troubleshoot:

Sample Texts for Testing

Test your code with the following:

Simple test:

hello world

English text:

The quick brown fox jumps over the lazy dog. The five boxing wizards jump quickly. How vexingly quick daft zebras jump!

Repeated patterns:

aaaaaabbbbbccccddeeeeeeeeeeeee

Code snippet:

def hello_world():
    print("Hello, World!")
    return True

Hints and Tips

Using the heapq Module

The Python heapq module provides an efficient implementation of the min-heap data structure, perfect for building Huffman trees:

  1. Importing the module:

    import heapq
    
  2. Creating and using a heap:

    # Create nodes
    nodes = [HuffmanNode('a', 10), HuffmanNode('b', 5), HuffmanNode('c', 15)]
    
    # Convert list to a heap
    heapq.heapify(nodes)  # Optional step when starting with a list
    
    # Add an element to the heap
    heapq.heappush(nodes, HuffmanNode('d', 7))
    
    # Get and remove the smallest element
    smallest_node = heapq.heappop(nodes)
    
  3. Important note: The heapq module requires that elements can be compared. For your HuffmanNode class, you need a __lt__ method (less than) to define how nodes should be compared:

    def __lt__(self, other):
        # Compare nodes based on frequency
        return self.freq < other.freq
    

Other Helpful Tips

Bonus Challenges (Optional)