Elias Gamma Encoding

Coding system developed by Peter Elias to encode positive integers. Commonly used to encode integers whose upper bound cannot be determined beforehand or to compress data in which small values are more frequent than large values.

Gamma coding minimizes the space needed to store numeric data on a file by minimizing the “wasted” space and adapt the length of the code on the finer grained bit level. When numeric data is stored in a predefined format (e.g. byte, integer, etc.) each number takes a predefined amount of space. Space is wasted simply because most of the numbers being stored require much less space.

The coding allows one to store an integer \(n\) using exactly \(1 + 2\log_2(n)\) bits, thus saving a large amount of space when numbers being saved are small.

Since everything is stored as a sequence of 0’s and 1’s, therefore one cannot simply shift the numbers in the sequence to fill up the “wasted” space; Because it would lead to loosing the track of starting and ending points of the numbers.

Two values are stored consecutively for each number:

  • length - bit size of the number
  • offset - difference between the number and the largest power of 2 which is smaller than the number.

More precisely consider the integer \(k\),

  • the length value of \(k\) is \(\lfloor\log_2(k)\rfloor\), and
  • the offset value is \(k - 2^{\lfloor\log_2(k)\rfloor}\)

Encoding

To keep track of the starting and ending point of the numbers, length is stored in unary followed by a 0 and offset is stored in binary. E.g. length value of 13 is 3 and stored in unary followed by a 0 (\(1110\)) and offset for 13 is 5 (\(13-8=5\)) and is represented in binary as \(101\).

in C we can encode with

((1 << l) - 1 << (l+1)) | o
// e.g. 13 -> 1110 101
// or
(o << (l+1) | ((1 << l) - 1 ))
// e.g. 13 -> 101 1110

Decoding

Decoding is much simpler. After length and offset are read, the number \(n\) is reconstructed by computing \(2^{\text{length}} + \text{offset}\).

number unary code length offset Gamma code
0 0 0
1 10 0 0 10,0
2 110 10 0 10,0
3 1110 10 1 10,1
4 11110 110 00 110,00
9 111111110 1110 001 1110,001
13 11110 110 101 1110,101
24 11110 1000 1000 11110,1000
511 111111110 11111111 11111111 11111110,11111111
1025 1111111110 00000001 0000000001 1111111110,0000000001

No notes link to this note