| Description 
The compressed suffix array was originally a data structure achieving
 bits of space [1,2] and no
self-indexing. Sadakane's version [3,4,5]
is a self-index that achieves at least zero-order compression. We refer to the
latter with the name CSA in this paper. 
The CSA represents the suffix array ![$ A[1,n]$](img2.gif) by a sequence of numbers  , such that ![$ A[\psi(i)]=A[i]+1$](img4.gif) . It is
shown [3] that  is piecewise monotone increasing.
More precisely,  increases in the areas of  where the suffixes start
with the same symbol  . In addition, there are long runs where  (those runs can be mapped one-to-one to the runs in  [6]). 
These properties permit a compact representation of  .
Essentially,  is differentially encoded,  , and long runs of 1's in
the differences are run-length encoded. Absolute samples are stored at
regular intervals to permit the differential decoding start from the
last sample preceding position  . The sampling rate gives a space/time
tradeoff for accesing and storing  . In [3] it is
shown that the index requires  bits of space,
which is improved in [6] to  . Implementation Kunihiko Sadakane and refactory by Rodrigo Cánovas. Interface by Rodrigo González. Build_options free_text: The text will be freed immediately after using it. samplerate=<number>: samplerate will be
interval between two indices of the suffix array stored explicitly.
That is, SA[i*samplerate] is stored explicitly. The default value is 16. samplepsi=<number>: samplepsi will be
interval between two indices of the psi function stored explicitly.
That is, Psi[i*samplepsi] is stored explicitly. The default value is 128.   Papers  [1]
R. Grossi and J. Vitter.
Compressed suffix arrays and suffix trees with applications to text
  indexing and string matching.
In Proceedings 32nd ACM Symposium on Theory of Computing
  (STOC), pages 397-406, 2000.  [2]
R. Grossi and J. Vitter.
Compressed suffix arrays and suffix trees with applications to text
  indexing and string matching.
SIAM Journal on Computing, 35(2):378-407, 2006.  [3]
K. Sadakane.
Compressed text databases with efficient query algorithms based on
  the compressed suffix array.
In Proceedings 11th Annual International Symposium on Algorithms
  and Computation (ISAAC), LNCS v. 1969, pages 410-421, 2000. 
[4]
K. Sadakane.
Succinct representations of lcp information and improvements in the 
  compressed suffix arrays.
In Proceedings 13th Annual ACM-SIAM Symposium on Discrete
  Algorithms (SODA), 2002.  [5]
K. Sadakane.
New text indexing functionalities of the compressed suffix arrays.
Journal of Algorithms, 48(2):294-313, 2003.  [6]
G. Navarro and V. Mäkinen.
Compressed full-text indexes.
ACM Computing Surveys, 39(1):article 2, 2007. Downloads |