Variable-length code

From WikiCoder

Variable-length code or variable-length encoding is encoding that maps source symbols to variable number of bits. One does not need to be a genius, to figure out, variable-lenght encoding is a milestone of data compression.

The basic idea behind many lossless compression algorithms is simple, the more frequent the symbol is, the shorter code it gets.

From compression point of view, there are two interesting groups of encodings:

  • uniquely decodable codes
  • prefix codes

Uniquely decodable code

Intuitively uniquely decodable encoding is such, that if you stick any two code words, they can be unambigously decodes.

Following table presents uniquely decodable encoding (that is not a prefix code).

Symbol Encoding
A 0
B 01
C 011

The following encoding is not uniquely decodable

Symbol Encoding
A 0
B 01
C 10

Encoded bit string 010, can be encoded either as

0 10 = AC


01 0 = BA

Prefix code

Prefix code are especially valuable, as they make decoding a breeze.

Prefix encoding are context-free, that means that the prefix of a code word defines how many more bits need to be read.

Consider following example:

Symbol Encoding
C 0
A 10
G 110
T 111

Kraft value

Now consider the following bit string:


First bit is 1, so the only single-bit code-word C can be skipped. Second bit is 0, so because this is a prefix code, we've identified A and decoding of next code word can proceed.

By going left to right, this gives following code words:


Formally: prefix code is a code where there is no code word in the encoding, which is also a prefix of some other word. Compare with first example in previous subsection, where 0 is prefix of 01, and 01, is prefix of 011