Lossless Compression Algorithms using Python

In today’s data-driven world, efficient storage and transmission of information are more critical than ever. Lossless compression techniques play a pivotal role in achieving this efficiency by reducing data size without compromising its integrity. Unlike lossy compression, which sacrifices some data to achieve higher compression rates, lossless methods ensure that the original data can be perfectly reconstructed from its compressed form. This makes lossless compression ideal for applications where data accuracy is paramount, such as text files, computer programs, executables, and databases.

In this blog post, we will focus on two fundamental lossless compression techniques: dictionary encoding and run-length encoding. By implementing these methods in Python, we will demonstrate their effectiveness in compressing a list of colours representing the different Lego bricks needed to build a colourful Lego tower.

Dictionary Encoding

Dictionary encoding, also known as substitution encoding, replaces recurring elements with shorter codes or symbols. This technique is effective when the data contains many repetitions. For our LEGO tower, we can use dictionary encoding to replace frequently occurring colours with shorter identifiers as follows:

Run-Length Encoding

Run-length encoding (RLE) is a simple form of data compression where consecutive elements (runs) are stored as a single data value and count. This technique is particularly effective for data with many consecutive repetitions. In our Lego tower, RLE can compress sequences of the same colour into a more compact form:

Combing Dictionary Encoding and Run-Length Encoding

By combining both lossless compression methods it may be possible to reach a higher compression rate, still without losing any data (lossless compression)!

Lossless Compression Algorithms, using Python!

Let’s use our Python skills to implement the two lossless compression techniques described above. To fully understand the code provided below, you will need a good understanding of how lists, dictionaries (a.k.a. hash tables) and tuples work in Python. You may first notice the use of square brackets [] in Python for storing data in a list, the use of curly brackets {} for storing pairs of key:value in a dictionary and the use of standard brackets () to store data in a tuple.

Use the tabs below to investigate how we have implemented our compression algorithms to compress our list of Lego brick colours.

Dictionary EncodingRun-Length EncodingDictionary Encoding + Run-Length Encoding
Our dictionary_compress() function will take one parameter, the data (list of values) to be compressed and return both a compressed list of encoded values as well as a dictionary that would be needed to decompress the data.

# Dictionary Encoding
def dictionary_encoding(data):
   compressedData = []
   dictionary = {}
   reversedDictionary = {}

   for value in data:
      if value not in reversedDictionary:
         key = value[0]     # e.g. "Y" for "Yellow"
         dictionary[key] = value     # "Y":"Yellow"
         reversedDictionary[value] = key  # "Yellow":"Y"
         compressedData.append(key)    # "Y"
      else:
         key = reversedDictionary[value]
         compressedData.append(key)
   return compressedData, dictionary

data = ["Yellow","Yellow","Yellow","Red","Red","Blue","Blue","Blue","Green","Red"]
print(">>> Original Data (Before compression)")
print(data)      
print("")
compressedData, dictionary = dictionary_encoding(data)
print(compressedData)
print(dictionary)
Our run_length_compress() function will take one parameter, the data (list of values) to be compressed and return a list of tuples to indicate the number of consecutive occurrences of each value.

# Run-Length Encoding
def run_length_encoding(data):
   compressedData = []
   currentKey = data[0]
   counter = 1
   for i in range(1,len(data)):
      if data[i]==currentKey:
         counter+=1
      else:
         compressedData.append((currentKey,counter))
         counter=1
         currentKey = data[i]
   compressedData.append((currentKey,counter))  
   return compressedData

data = ["Yellow","Yellow","Yellow","Red","Red","Blue","Blue","Blue","Green","Red"]
print(">>> Original Data (Before compression)")
print(data)      
print("")
compressedData = run_length_encoding(data)
print(compressedData)
We can now create a lossless_compress() function that will reuse both our lossless compression functions: dictionary_compress() and run_length_compress()

# Lossless Compression Algorithms using Python - www.101computing.net/lossless-compression-algorithms-using-python

# Dictionary Encoding
def dictionary_encoding(data):
   compressedData = []
   dictionary = {}
   reversedDictionary = {}

   for value in data:
      if value not in reversedDictionary:
         key = value[0]     # e.g. "Y" for "Yellow"
         dictionary[key] = value     # "Y":"Yellow"
         reversedDictionary[value] = key  # "Yellow":"Y"
         compressedData.append(key)    # "Y"
      else:
         key = reversedDictionary[value]
         compressedData.append(key)
   return compressedData, dictionary

# Run-Length Encoding
def run_length_encoding(data):
   compressedData = []
   currentKey = data[0]
   counter = 1
   for i in range(1,len(data)):
      if data[i]==currentKey:
         counter+=1
      else:
         compressedData.append((currentKey,counter))
         counter=1
         currentKey = data[i]
   compressedData.append((currentKey,counter))  
   return compressedData

# Combining Both Lossless Compression Techniques
def lossless_compression(data):
   print(" >>> Step 1: Dictionary Encoding")
   compressedData, dictionary = dictionary_encoding(data)
   print(" > Compressed Data:")
   print(compressedData)  

   print(" > Dictionary:")
   print(dictionary)
   print("")
   print(" >>> Step 2: Run-Length Encoding")
   fullyCompressedData = run_length_encoding(compressedData)
   print(" > Fully Compressed Data:")
   print(fullyCompressedData)        
   
  
data = ["Yellow","Yellow","Yellow","Red","Red","Blue","Blue","Blue","Green","Red"]
print(">>> Original Data (Before compression)")
print(data)      
print("")
lossless_compression(data)

Try it online…

You can now test these functions using the code provided below…

Python Challenge

Your Python challenge is to write an algorithm to decompress data using a list of compressed data and a given dictionary, your python code will need to reconstruct the original data:

unlock-access

Solution...

The solution for this challenge is available to full members!
Find out how to become a member:
➤ Members' Area

Did you like this challenge?

Click on a star to rate it!

Average rating 3.4 / 5. Vote count: 21

No votes so far! Be the first to rate this post.

As you found this challenge interesting...

Follow us on social media!