Elias Gamma Encoding
Coding system developed by Peter Elias to encode positive integers. Commonly used to encode integers whose upper bound cannot be determined beforehand or to compress data in which small values are more frequent than large values.
Gamma coding minimizes the space needed to store numeric data on a file by minimizing the “wasted” space and adapt the length of the code on the finer grained bit level. When numeric data is stored in a predefined format (e.g. byte, integer, etc.) each number takes a predefined amount of space. Space is wasted simply because most of the numbers being stored require much less space.
The coding allows one to store an integer \(n\) using exactly \(1 + 2\log_2(n)\) bits, thus saving a large amount of space when numbers being saved are small.
Since everything is stored as a sequence of 0’s and 1’s, therefore one cannot simply shift the numbers in the sequence to fill up the “wasted” space; Because it would lead to loosing the track of starting and ending points of the numbers.
Two values are stored consecutively for each number:
- length - bit size of the number
- offset - difference between the number and the largest power of 2 which is smaller than the number.
More precisely consider the integer \(k\),
- the length value of \(k\) is \(\lfloor\log_2(k)\rfloor\), and
- the offset value is \(k - 2^{\lfloor\log_2(k)\rfloor}\)
Encoding
To keep track of the starting and ending point of the numbers, length is stored in unary followed by a 0 and offset is stored in binary. E.g. length value of 13 is 3 and stored in unary followed by a 0 (\(1110\)) and offset for 13 is 5 (\(13-8=5\)) and is represented in binary as \(101\).
in C we can encode with
((1 << l) - 1 << (l+1)) | o
// e.g. 13 -> 1110 101
// or
(o << (l+1) | ((1 << l) - 1 ))
// e.g. 13 -> 101 1110
Decoding
Decoding is much simpler. After length and offset are read, the number \(n\) is reconstructed by computing \(2^{\text{length}} + \text{offset}\).
number | unary code | length | offset | Gamma code |
---|---|---|---|---|
0 | 0 | 0 | ||
1 | 10 | 0 | 0 | 10,0 |
2 | 110 | 10 | 0 | 10,0 |
3 | 1110 | 10 | 1 | 10,1 |
4 | 11110 | 110 | 00 | 110,00 |
9 | 111111110 | 1110 | 001 | 1110,001 |
13 | 11110 | 110 | 101 | 1110,101 |
24 | 11110 | 1000 | 1000 | 11110,1000 |
511 | 111111110 | 11111111 | 11111111 | 11111110,11111111 |
1025 | 1111111110 | 00000001 | 0000000001 | 1111111110,0000000001 |