Lossless Compression: Huffman Coding Algorithm

Posted on March 4, 2021 by Administrator Posted in A Level Concepts, Algorithms, Computer Science, Computing Concepts, Python - Advanced, Python Challenges, Solved Challenges

The Huffman Coding algorithm is used to implement lossless compression. For the purpose of this blog post, we will investigate how this algorithm can be implemented to encode/compress textual information.

The principle of this algorithm is to replace each character (symbols) of a piece of text with a unique binary code. However the codes generated may have different lengths. In order to optimise the compression process, the idea behind the Huffman Coding approach is to associate shorter codes to the most frequently used symbols and longer codes to the less frequently used symbols.

So let’s consider the following message which consists of 20 characters:

Note that the Huffman coding algorithm is normally used to compress larger amount of data and would not really be used to compress such a small message!

Step 1: Frequency Analysis

The first step of the Huffman Coding algorithm is to complete a frequency analysis of this message by:

Identifying all the symbols used in the message,
Identifying their weights (or frequencies) by counting their occurrences (the number of times they appear) within the message,
Sorting this list of symbols in ascending order of their weights.

Step 2: Organising Symbols as a Binary Tree

To do so each symbol becomes a node storing the symbol itself and its weight.
Then the tree is constructed through the following iterative process:

Step 3: Generating the Huffman Codes

Using the above tree, we can now identify the Huffman code for each symbol (leaves of the tree. This is done by highlighting the path from the Root node of the tree to the required leave/symbol. The labels of each branch are concatenated to form the Huffman code for the symbol.

As you can see, most frequent symbols are closer to the root node of the tree, resulting in shorter Huffman codes.

The resulting Huffman Codes for our message are:

Encoding the message

We can now encode the message by replacing each symbol with its matching Huffman code.

Encoded/compressed message:

010100111101000011111011010111100001101111010000111100101000110

Python Implementation

Here is a Python implementation of the Huffman Coding algorithm:

Python Task

To complete the above code, we need an extra method to our HuffmanEncoder class called decode().

This method should take two parameters:

An encoded message (Binary string)
the Huffman codes used to encode the message (Using a python dictionary)

Using the above information, the decode() method would then decode the message one character at a time.

Solution...

The solution for this challenge is available to full members!
Find out how to become a member:
➤ Members' Area

Other challenges you may enjoy...

Tagged with: Binary Tree

Learning Zone ✕

Theory Concepts

Interactive Quizzes

GCSE Computer Science (UK)

A-Level Computer Science (UK)

Coding Challenges ✕

Python Challenges

HTML / CSS & JavaScript

LMC

SQL

Block Programming

Interactive Coding Puzzles

Online Interactive Tools ✕

Coding Tools

Design Tools

Logic Gates Circuits

Cryptography Corner

Member's Area ✕

Register/Sign Up

Get In Touch

Books Corner