Lesson

The purpose of data compression is to make the files smaller which means that:

  • Less time / less bandwidth to transfer data
  • Take up less space on the disk

Given that there are 7 bits per ASCII character, the uncompressed size of an ASCII phrase is:


size = number of characters (including spaces) x 7


Run Length Encoding (RLE) is a compression method where sequences of the same values are stored in pairs of the value and the number of those values. For instance, the sequence:

0 0 0 1 1 0 1 1 1 1 0 1 1 1 1

would be represented as:

3 0 2 1 1 0 4 1 1 0 4 1


Huffman coding is a form of compression that allows us to use fewer bits for higher frequency data. More common letters are represented using fewer bits than less common letters. For instance, “a” and “e”, which occur in many words would be represented with fewer bit than “z” which occurs rarely. This allows for much more effective compression than RLE.


The steps involved in Huffman encoding as are follows:

  1. Do frequency table
  2. Order table
  3. Create the tree
  4. Add 1, 0 to the branches
  5. Encode letters
  6. Encode message

Worked Example: How much smaller is the phrase henry horse encoded using Huffman encoding compared with its uncompressed size.


Calculate the uncompressed size

In the phrase henry horse there are 11 characters (including the space). Therefore the uncompressed size is 11 x 7 = 77 bits


Generate ordered frequency table (steps 1 and 2)

letter frequency
e 2
h 2
r 2
space 1
o 1
s 1
y 1
n 1

Create the tree and add 1 and 0 to branches (steps 3 and 4)
huffman tree example

letter encoding
e 01
h 00
r 111
space 100
o 1011
s 1010
y 1101
n 1100

Encode message

00 01 1100 111 1101 100 00 1011 111 1000 01 = 33 bits


Therefore by using compression we have reduced the size from 77 bits to 33 bits a saving of 44 bits.



Learning Videos

For more information click on the tab below to watch a video about the lesson.


Click for video - Compression introduction


Click for video - Compression - Huffman coding


Click for video - Compression - Run length encoding

Questions

  1. Give two reasons why you might compress photograph files before emailing them.


    Reason 1: The smaller files will take much less time to download.

    Reason 2: The file size may exceed the limit of what can be attached to an email.

    Or another reason such as reducing the storage size on a friend's computer / email storage quota.

  2. A music production company needs to save music at the highest quality possible. They would like to make use of compression if possible.

    Explain how compression can help the music production company.


    The file size of the compressed file will be smaller, so take up less storage space or be faster to upload / transmit / email.

  3. The sentence ADAM MADE A MEAL is to be compressed using Huffman code.

    Complete the table of frequencies for characters in the sentence.

    A D - - - -
    5 2 - - - -

    A D M Space E L
    5 2 3 3 2 1
  4. Complete the Huffman tree below from your table of frequencies above.

    incomplete huffman tree
    complete huffman tree
  5. Use the tree to encode the word LEAD.


    00010011001

  6. An algorithm called Run Length Encoding is used to compress an image. It converts the file by recording the colour code, represented here by a letter, followed by the number of pixels of the same colour, e.g. R4.

    Give the result of applying the algorithm to the following data for an image:

    G G G R R Y Y Y Y R R

    G3R2Y42R



Go Back