Compressed Compact Suffix Array
The Compressed Compact Suffix Array (CCSA) is a simple self-index using essentially the structures of the Compact Suffix Array (CSA). The CSA was aimed at explicitly exploiting the self-repetitions of the suffix array in order to store it in a more compact form.
The size of the index built on a string is bounded by bits, where is the -th order empirical entropy of .
The above bound holds simultaneously for all with . And is an arbitrary constant.
Using the CCSA, the counting of the occurrences of an arbitrary pattern as a substring of takes time. Locating each pattern occurrence takes time. Reporting a text substring of length takes time.
ImplementationVeli Mäkinen, Rodrigo González
free_text: The text will be freed immediately after using it.
samplerate=<number>: The index marks one text position every <number> entries. The default value is 64.
V. Mäkinen and G. Navarro. Compressed compact suffix arrays. In Proc. 15th Annual Symposium on Combinatorial Pattern Matching (CPM), LNCS v. 3109 (2004), pp. 420-433.