Wikipedia

Incremental encoding

Incremental encoding, also known as front compression, back compression, or front coding, is a type of delta encoding compression algorithm whereby common prefixes or suffixes and their lengths are recorded so that they need not be duplicated. This algorithm is particularly well-suited for compressing sorted data, e.g., a list of words from a dictionary.

For example:

Input Common prefix Compressed output
myxa myxophyta myxopod nab nabbed nabbing nabit nabk nabob nacarat nacelle 
no preceding word 'myx' 'myxop' no common prefix 'nab' 'nabb' 'nab' 'nab' 'nab' 'na' 'nac' 
0 myxa 3 ophyta 5 od 0 nab 3 bed 4 ing 3 it 3 k 3 ob 2 carat 3 elle 
64 bytes 46 bytes

The encoding used to store the common prefix length itself varies from application to application. Typical techniques are storing the value as a single byte; delta encoding, which stores only the change in the common prefix length; and various universal codes. It may be combined with other general lossless data compression techniques such as entropy encoding and dictionary coders to compress the remaining suffixes.

Applications

Incremental encoding is widely used in information retrieval to compress the lexicons used in search indexes; these list all the words found in all the documents and a pointer for each one to a list of locations. Typically, it compresses these indexes by about 40%.[1]

As one example, incremental encoding is used as a starting point by the GNU locate utility, in an index of filenames and directories. The GNU locate utility further uses bigram encoding to further shorten popular filepath prefixes.

References

  1. ^ Ian H. Witten, Alistair Moffat, Timothy C. Bell. Managing Gigabytes. Second edition. Academic Press. ISBN 1-55860-570-3. Section 4.1: Accessing the lexicon, subsection Front coding, pp.159–161.
This article is copied from an article on Wikipedia® - the free encyclopedia created and edited by its online user community. The text was not checked or edited by anyone on our staff. Although the vast majority of Wikipedia® encyclopedia articles provide accurate and timely information, please do not assume the accuracy of any particular article. This article is distributed under the terms of GNU Free Documentation License.

Copyright © 2003-2025 Farlex, Inc Disclaimer
All content on this website, including dictionary, thesaurus, literature, geography, and other reference data is for informational purposes only. This information should not be considered complete, up to date, and is not intended to be used in place of a visit, consultation, or advice of a legal, medical, or any other professional.