Alphabet-Friendly FM-index |
Description An Alphabet-Friendly FM-index combines an existing compression boosting technique with the wavelet tree data structure. This index scales well with the size of the input alphabet . The size of the new index built on a string is bounded by bits, where is the -th order empirical entropy of . The above bound holds simultaneously for all and . Moreover, the index design does not depend on the parameter , which plays a role only in analysis of the space occupancy. Using the AF-index, the counting of the occurrences of an arbitrary pattern as a substring of takes time. Locating each pattern occurrence takes time. Reporting a text substring of length takes time. On the other hand, if we also have polylog , the counting of the occurrences of an arbitrary pattern as a substring of takes time. Locating each pattern occurrence takes time. Reporting a text substring of length takes time. Implementation Rodrigo González Build_options free_text: The text will be freed immediately after using it. samplerate=<number>: The index marks one text position every <number> entries. The default value is 64. Papers P. Ferragina, G. Manzini, V. Mäkinen and G. Navarro. An Alphabet-Friendly FM-index. In Proc. SPIRE'04, pages 150-160. LNCS 3246. P. Ferragina, G. Manzini, V. Mäkinen and G. Navarro. Compressed Representation of Sequences and Full-Text Indexes. Technical Report 2004-05, Technische Fakultät, Universität Bielefeld, Germany, 2004. P. Ferragina, G. Manzini, V. Mäkinen and G. Navarro. Compressed Representations of Sequences and Full-Text Indexes. To appear in ACM Transactions on Algorithms (TALG). Downloads |