Digital data consists of binary information and is stored as a collection of 0’s and 1’s. On a computer system, numbers, text, pictures, sound files, video clips and computer programs are all stored using binary code.
Storing text files in binary
Text files are stored using a character set such as ASCII code or UNICODE. The number of bits used to encode one character has an impact on the total number of characters included in the character set.
For instance:
- ASCII code uses 7 bits per characters and contains 128 codes/characters.
- Extended ASCII code uses 8 bits per characters and contains 256 codes/characters.
- UNICODE uses either 2 Bytes (UTF-16) or 4 Bytes (UTF-32) per character and contains either 65,536 or 4,294,967,296 characters, enough to include all the characters and symbols used in every language worldwide.
Based on this information, we can easily work out the formula used to estimate the size of a text file as follows:
Text File Size = number of bits per character x number of characters
Text File Size Estimation
Storing bitmap pictures in binary
A bitmap picture is a 2D grid of pixels of different colours. You can read more about how bitmap pictures are stored in binary on this post.
Two criteria will impact the file size of a bitmap picture:
- The resolution: The number of pixels it contains which can be defined as: width in pixels x height in pixels. For instance a picture of 640 by 480 pixels would contain 640 x 480 = 307,200 pixels.
- The colour depth: The number of bits used to encode the colour of one pixel. For instance a 1-bit colour depth means that the graphic can only include 2 colours (e.g. 1 = black, 0 = white), and 8-bit colour depth means that the graphic can include up to 256 colours, and a 3-Byte colour depth (RGB code) would include 16,777,216 colours.
Based on this information, we can easily work out the formula used to estimate the size of a bitmap picture as follows:
Picture File Size = colour depth x width in pixels x height in pixels
Picture File Size Estimation
Note that a bitmap picture would also include a few more Bytes of data to store the Meta Data which contain additional information used by the computer to render the graphic such as the width of the graphic in pixels, its height in pixels and its colour depth. We will however ignore this in our file size estimation as for large graphics this would only make a small difference to the file size estimation.
Storing sound files in binary
An analogue sound wave can be digitalised using a process called sound sampling. You can find out more about sound sampling on this post.
Three criteria will impact the file size of a sound file:
- The sample rate: The sample rate correspond to the number of samples being recorded per second. For instance a phone call would have a sample rate of 8kHz (8,000 samples per second) whereas an audio CD would record music with a sample rate of 44.1kHZ (44,000 samples per second) resulting in a higher quality sound.
- The bit depth: The bit depths correspond to the number of bits used to record one sample. For instance retro-arcade games used to use 8-bit music. Old mobile phones used to use 16-bit ringtones. Higher quality sound files may use a 32-bit bit-depth or higher.
- The duration: The duration of a the sound files in seconds will impact on the number of samples needed to record the sound file and hence it will have an impact on the file size.
Based on this information, we can easily work out the formula used to estimate the size of a sound file as follows:
Sound File Size = sample rate x duration x bit depth
Mono-Sound File Size Estimation
Note that the above formula is used to estimate the file size of a mono sound file. some sound files use multiple channels such as stereo files (2 channels) or Dolby-surround sound files (6 channels). To estimate their file size, you need to multiply the above formula by the number of channels.
Sound File Size = sample rate x duration x bit depth x number of channels
Sound File Size Estimation
Also, similar to picture files, a sound file would also include some meta-data (sample rate, bit depth, number of channels) needed for the computer to interpret the data, however we will once again ignore this data in our file size estimation.
Programming Task
Your task is to write three procedures used to estimate the file size of text files, bitmap pictures and sound files as follows:
- estimateTextFileSize() will take two parameters, the number of bits per character and the number of characters in the file. It will output the estimated file size using the formula provided earlier in this post.
- estimatePictureFileSize() will take three parameters, the width and height of the picture in pixels and its colour depth. It will output the estimated file size using the formula provided earlier in this post.
- estimateSoundFileSize() will take four parameters, the sample rate (in Hz), the bit depth, the duration (in seconds) and the number of channels. It will output the estimated file size using the formula provided earlier in this post.
Note that for all three procedures, the output information should be displayed using the most suitable unit (bits, Bytes, KB, MB or GB)
Python Code
Complete your code below:
Test Plan
All done? It’s now time to test your code to see if it works as expected.
Test # | Type of file | Input Values | Expected Output | Actual Output |
#1 | Text File | Number of bits per character: 8 bits (Extended ASCII) Number of characters: 3,000 |
File Size: 3KB (or 2.93KB) | |
#2 | Text File | Number of bits per character: 16 bits (Unicode UTF-16) Number of characters: 12,000 |
File Size: 24KB (or 23.44KB) | |
#3 | Picture File | Width: 640 pixels Height: 480 pixels Colour depth: 8 bits |
File Size: 307.2KB (or 300KB) | |
#4 | Picture File | Width: 1920 pixels Height: 1080 pixels Colour depth: 24 bits |
File Size: 6.22MB (or 5.93MB) | |
#5 | Sound File (Mobile phone ring tone) | Sample Rate: 8 KHZ (=8,000 Hz) Bit Depth: 16-bits per sample Duration: 30 seconds Channel: 1 (mono) |
File Size: 480KB (or 468.75KB) | |
#6 | Sound File (uncompressed audio CD track) | Sample Rate: 44.1 KHZ Bit Depth: 16-bits per sample Duration: 210 seconds Channel: 2 (stereo) |
File Size: 37.04MB (or 35.33MB) |
Note that this test plan gives you two possible outputs for each test depending on whether your calculations are based on 1KB = 1,000 Bytes or 1KB=1,024 Bytes. Both approaches are acceptable.
Extension Task 1: Animated Gif File
Animated Gif files consists of a collection of bitmap pictures that are displayed one at a time over a few seconds. Most animated gif files loop back to the first picture (frame) after reaching the last frame. The frame rate of a gif file defines the number of frames per second.
We can calculate the size of an animated gif files as follows:
Animated Gif File Size = width x height x colour depth x frame rate x duration
Animated Gif File Size Estimation
Your task is to create an extra function called estimateAnimatedGifFileSize() that will take five parameters, the width and height of the pictures in pixels, their colour depth, the frame rate in fps (frame per seconds) and the duration of the animation in seconds. It will output the estimated file size using the above formula.
You can then test your subroutine using the following input data:
Test # | Type of file | Input Values | Expected Output | Actual Output |
#1 | Animated Gif File | Width: 150 pixels Height: 150 pixels Colour depth: 4 bits Frame Rate: 4 fps Duration: 6 seconds. |
File Size: 270KB (or 263.67KB) |
Extension Task 2: Movie Files
Movie files are similar to animated gif. A movie clip also consists of a collection of still pictures displayed with a high frame rate e.g. 24 fps (frames per seconds). Movie clips also include a soundtrack that also need to be included in the estimation of the overall file size of a movie clip.
You can then test your subroutine using the following input data:
Test # | Type of file | Input Values | Expected Output | Actual Output | |
#1 | Uncompressed Movie File | Width: 1920 pixels Height: 1080 pixels Colour depth: 24 bits Frame Rate: 24 fps Duration: 1 hour 15 minutes Soundtrack: |
File Size: 672GB (or 640GB) | File Size: 6.22MB (or 5.93MB) |
Compression Algorithms
Note that these calculations are based on estimating file size of uncompressed files. Compression algorithms are often applied to picture files, sound files and movie files to reduce their overall file size.
For instance .png or .jpg picture files, .mp3 sound files or .mp4 movie files are all compressed files so their file size would be smaller than the file size given by the above calculations.