Pizza&Chili Corpus
Compressed Indexes and their Testbeds

The Italian mirror | The Chilean mirror

Compressed Compact Suffix Array


The Compressed Compact Suffix Array (CCSA) is a simple self-index using essentially the structures of the Compact Suffix Array (CSA). The CSA was aimed at explicitly exploiting the self-repetitions of the suffix array $ A$ in order to store it in a more compact form.

The size of the index built on a string $ T[1,n]$ is bounded by $ n(H_k(T)(1+\log n)+2+1/\epsilon)+O(n \log\log n)/ \log_{\vert\Sigma\vert} n)$ bits, where $ H_k(T)$ is the $ k$ -th order empirical entropy of $ T$ .

The above bound holds simultaneously for all $ k \leq \alpha \log_{\vert\Sigma\vert} n$ with $ 0 < \alpha < 1$ . And $ \epsilon>0$ is an arbitrary constant.

Using the CCSA, the counting of the occurrences of an arbitrary pattern $ P[1,p]$ as a substring of $ T$ takes $ O(p \log_{\vert\Sigma\vert} n)$ time. Locating each pattern occurrence takes $ O(\epsilon \log_{\vert\Sigma \vert} n)$ time. Reporting a text substring of length $ l$ takes $ O((l + \epsilon \log n)$ time.


Veli Mäkinen, Rodrigo González


free_text: The text will be freed immediately after using it.

samplerate=<number>: The index marks one text position every <number> entries. The default value is 64.


V. Mäkinen and G. Navarro. Compressed compact suffix arrays. In Proc. 15th Annual Symposium on Combinatorial Pattern Matching (CPM), LNCS v. 3109 (2004), pp. 420-433.


Send Mail to Us | © P. Ferragina and G. Navarro, Last update: September, 2005.