Compact Data Structures

What are they?

CDS’s combine in a unique storage structure a compressed representation of a dataset and the mechanisms that allow accessing any given datum without the need of decompressing the data from the beginning.

The whole point of compact data structures: to represent combinatorial objects within their entropy space (or close), not for the sole purpose of compressing them, but also with the aim of navigating and querying them efficiently in compressed form.

In addition to classical space and bandwidth savings, with CDS’s we can process larger datasets within the same memory, make better usage of memory hierarchy (reducing costly disk accesses), and improve the performance when using parallel processing.

Due to the ability to process the compressed data, compression/decompression at either side of data exchange is not needed.

Additionally CDS’s have the characteristic of self-indexation where the the indexation of the data is not provided in an auxiliary structure but rather the index + the data is kept in the same storage structure while occupying less space than the original data. This allows for answering queries faster than a query over a plain representation in the same compressed space.

Basic Structures

Basic structures which are used as building blocks for other CDS

References

[1] Navarro, Gonzalo, Compact Data Structures: A Practical Approach, Cambridge University Press, 2016.


Links to this note