Pizza&Chili Corpus
Compressed Indexes and their Testbeds

The Italian mirror | The Chilean mirror

Repetitive Corpus

Here we present a corpus of repetitive texts. These texts are categorized according to the source they come from into the following: Artificial Texts, Pseudo-Real Texts and Real Texts. The main goal of this collection is to serve as a standard testbed for benchmarking algorithms oriented to repetitive texts.

Download

The files are compresed using p7zip and gzip for saving bandwidth.

Statistics

The compression statistics of all texts, as well as information of the origin of them, can be viewed in the following PDF file.

Indexes

The following indexes are specifically oriented to repetitive texts.

Send Mail to Us | © P. Ferragina and G. Navarro, Last update: October, 2010.