Compressed Suffix Array (CSA)
The compressed suffix array was originally a data structure achieving bits of space [1,2] and no self-indexing. Sadakane's version [3,4,5] is a self-index that achieves at least zero-order compression. We refer to the latter with the name CSA in this paper.
The CSA represents the suffix array by a sequence of numbers , such that . It is shown  that is piecewise monotone increasing. More precisely, increases in the areas of where the suffixes start with the same symbol . In addition, there are long runs where (those runs can be mapped one-to-one to the runs in ).
These properties permit a compact representation of . Essentially, is differentially encoded, , and long runs of 1's in the differences are run-length encoded. Absolute samples are stored at regular intervals to permit the differential decoding start from the last sample preceding position . The sampling rate gives a space/time tradeoff for accesing and storing . In  it is shown that the index requires bits of space, which is improved in  to .
Kunihiko Sadakane and refactory by Rodrigo Cánovas. Interface by Rodrigo González.
free_text: The text will be freed immediately after using it.
samplerate=<number>: samplerate will be interval between two indices of the suffix array stored explicitly. That is, SA[i*samplerate] is stored explicitly. The default value is 16.
samplepsi=<number>: samplepsi will be interval between two indices of the psi function stored explicitly. That is, Psi[i*samplepsi] is stored explicitly. The default value is 128.