Pizza&Chili Corpus
Compressed Indexes and their Testbeds

The Italian mirror | The Chilean mirror

Run-Length FM-Index

Description

The Run-Length FM-Index (RLFM) is an improvement over the Wavelet Tree FM-index, which exploits the equal-letter runs of the BWT to achieve $ O(nH_k \log \sigma)$ bits of space, where $ H_k(T)$ is the $ k$ -th order empirical entropy of $ T$ . The above bound holds simultaneously for all $ k \leq \alpha \log_{\vert\Sigma\vert} n$ with $ 0 < \alpha < 1$ .

Using the RLFM, the counting of the occurrences of an arbitrary pattern $ P[1,p]$ as a substring of $ T$ takes $ O(p \log \sigma)$ time. Locating each pattern occurrence takes $ O(\log^{1+\epsilon} n \log \sigma)$ time. Reporting a text substring of length $ l$ takes $ O(l + \log^{1+\epsilon} n )$ time.

Implementation

Veli Mäkinen, Rodrigo González

Build_options

free_text: The text will be freed immediately after using it.

samplerate=<number>: The index marks one text position every <number> entries. The default value is 64.

Paper

V. Mäkinen and G. Navarro. New search algorithms and time/space tradeoffs for succinct suffix arrays. Technical Report C-2004-20 (April), University of Helsinki, Finland.

V. Mäkinen and G. Navarro. Run-length FM-index. In Proc. DIMACS Workshop: The Burrows-Wheeler Transform: Ten Years Later (Aug. 2004), pp. 17-19.

V. Mäkinen and G. Navarro. Succinct suffix arrays based on run-length encoding. In Proc. 16th Annual Symposium on Combinatorial Pattern Matching (CPM), LNCS v. 3537 (2005), pp. 45-56.

V. Mäkinen and G. Navarro. Succinct suffix arrays based on run-length encoding. Nordic Journal of Computing 12(1):40-66, 2005.

Downloads



Send Mail to Us | © P. Ferragina and G. Navarro, Last update: September, 2005.