Api |
We are particularly interested in self-indexes, namely compressed indexes that encapsulate sufficient information to reproduce any substring of the indexed text, and thus possibly the text itself. If a compressed index is not a self-index, then one must keep the text together with the index and report the text size plus the index size. To use a compressed index over a text, we first have to build it, and then we can either query it to count or locate the occurrences of the queried pattern, or we can access some snippets of the indexed text for displaying the context of a pattern occurrence, or for retrieving some text substrings (possibly the whole text). Indexes are used through the following API interface, written in the C/C++ language. We actually use uchar for denoting unsigned char and ulong for denoting unsigned long. The interface assumes that each text symbol is represented in one byte. The integer e returned by any procedure indicates an error code, if different of zero. The error message can be accessed by calling the procedure char *error_index(e). We further recall that text and pattern indexes start at zero. Below you find a schematic summary of the API interface offered by all the compressed indexes available for downloading. Please read carefully the COPYRIGHT information that comes with each of them.
Building the index
Querying the index
|
Function | Parameters | Comment |
int extract |
void *index, |
Allocates snippet (which must be freed by the caller) and writes the substring text[from..to] into it. Returns in snippet_length the length of the text snippet actually extracted (that could be less than to-from+1 if to is larger than the text size). |
int display |
void *index, |
Displays the text (snippet) surrounding any occurrence of the substring pattern[0..length-1] within the text indexed by index. The snippet must include numc characters before and after the pattern occurrence, totalizing length+2*numc characters, or less if the text boundaries are reached. Writes in numocc the number of occurrences, and allocates the arrays snippet_text and snippet_lengths (which must be freed by the caller). The first is a character array of numocc*(length+2*numc) characters, with a new snippet starting at every multiple of length+2*numc. The second gives the real length of each of the numocc snippets. |
int length |
void *index, |
Obtains the length of the text indexed by index. |