Repetitive Corpus |
|
Here we present a corpus of repetitive texts. These texts are categorized according to the source they come from into the following: Artificial Texts, Pseudo-Real Texts and Real Texts. The main goal of this collection is to serve as a standard testbed for benchmarking algorithms oriented to repetitive texts. Download The files are compresed using p7zip and gzip for saving bandwidth.Statistics The compression statistics of all texts, as well as information of the origin of them, can be viewed in the following PDF file.Indexes The following indexes are specifically oriented to repetitive texts.
Send Mail to Us | © P. Ferragina and G. Navarro, Last update: October, 2010.
|