In today’s data-driven world, efficient storage and transmission of information are more critical than ever. Lossless compression techniques play a pivotal role in achieving this efficiency by reducing data size without compromising its integrity. Unlike lossy compression, which sacrifices some data to achieve higher compression rates, lossless methods ensure that the original data can be perfectly reconstructed from its compressed form. This makes lossless compression ideal for applications where data accuracy is paramount, such as text files, computer programs, executables, and databases.
In this blog post, we will focus on two fundamental lossless compression techniques: dictionary encoding and run-length encoding. By implementing these methods in Python, we will demonstrate their effectiveness in compressing a list of colours representing the different Lego bricks needed to build a colourful Lego tower.
Dictionary Encoding
Dictionary encoding, also known as substitution encoding, replaces recurring elements with shorter codes or symbols. This technique is effective when the data contains many repetitions. For our LEGO tower, we can use dictionary encoding to replace frequently occurring colours with shorter identifiers as follows:
Run-Length Encoding
Run-length encoding (RLE) is a simple form of data compression where consecutive elements (runs) are stored as a single data value and count. This technique is particularly effective for data with many consecutive repetitions. In our Lego tower, RLE can compress sequences of the same colour into a more compact form:
Combing Dictionary Encoding and Run-Length Encoding
By combining both lossless compression methods it may be possible to reach a higher compression rate, still without losing any data (lossless compression)!
Lossless Compression Algorithms, using Python!
Let’s use our Python skills to implement the two lossless compression techniques described above. To fully understand the code provided below, you will need a good understanding of how lists, dictionaries (a.k.a. hash tables) and tuples work in Python. You may first notice the use of square brackets [] in Python for storing data in a list, the use of curly brackets {} for storing pairs of key:value in a dictionary and the use of standard brackets () to store data in a tuple.
Use the tabs below to investigate how we have implemented our compression algorithms to compress our list of Lego brick colours.
# Dictionary Encoding def dictionary_encoding(data): compressedData = [] dictionary = {} reversedDictionary = {} for value in data: if value not in reversedDictionary: key = value[0] # e.g. "Y" for "Yellow" dictionary[key] = value # "Y":"Yellow" reversedDictionary[value] = key # "Yellow":"Y" compressedData.append(key) # "Y" else: key = reversedDictionary[value] compressedData.append(key) return compressedData, dictionary data = ["Yellow","Yellow","Yellow","Red","Red","Blue","Blue","Blue","Green","Red"] print(">>> Original Data (Before compression)") print(data) print("") compressedData, dictionary = dictionary_encoding(data) print(compressedData) print(dictionary)
# Run-Length Encoding def run_length_encoding(data): compressedData = [] currentKey = data[0] counter = 1 for i in range(1,len(data)): if data[i]==currentKey: counter+=1 else: compressedData.append((currentKey,counter)) counter=1 currentKey = data[i] compressedData.append((currentKey,counter)) return compressedData data = ["Yellow","Yellow","Yellow","Red","Red","Blue","Blue","Blue","Green","Red"] print(">>> Original Data (Before compression)") print(data) print("") compressedData = run_length_encoding(data) print(compressedData)
# Lossless Compression Algorithms using Python - www.101computing.net/lossless-compression-algorithms-using-python # Dictionary Encoding def dictionary_encoding(data): compressedData = [] dictionary = {} reversedDictionary = {} for value in data: if value not in reversedDictionary: key = value[0] # e.g. "Y" for "Yellow" dictionary[key] = value # "Y":"Yellow" reversedDictionary[value] = key # "Yellow":"Y" compressedData.append(key) # "Y" else: key = reversedDictionary[value] compressedData.append(key) return compressedData, dictionary # Run-Length Encoding def run_length_encoding(data): compressedData = [] currentKey = data[0] counter = 1 for i in range(1,len(data)): if data[i]==currentKey: counter+=1 else: compressedData.append((currentKey,counter)) counter=1 currentKey = data[i] compressedData.append((currentKey,counter)) return compressedData # Combining Both Lossless Compression Techniques def lossless_compression(data): print(" >>> Step 1: Dictionary Encoding") compressedData, dictionary = dictionary_encoding(data) print(" > Compressed Data:") print(compressedData) print(" > Dictionary:") print(dictionary) print("") print(" >>> Step 2: Run-Length Encoding") fullyCompressedData = run_length_encoding(compressedData) print(" > Fully Compressed Data:") print(fullyCompressedData) data = ["Yellow","Yellow","Yellow","Red","Red","Blue","Blue","Blue","Green","Red"] print(">>> Original Data (Before compression)") print(data) print("") lossless_compression(data)
Try it online…
You can now test these functions using the code provided below…
Python Challenge
Your Python challenge is to write an algorithm to decompress data using a list of compressed data and a given dictionary, your python code will need to reconstruct the original data:

Solution...
The solution for this challenge is available to full members!Find out how to become a member:
➤ Members' Area