Computing matching statistics and maximal exact matches on compressed full-text indexes. (English)
Chavez, Edgar (ed.) et al., String processing and information retrieval. 17th international symposium, SPIRE 2010, Los Cabos, Mexico, October 11‒13, 2010. Proceedings. Berlin: Springer (ISBN 978-3-642-16320-3/pbk). Lecture Notes in Computer Science 6393, 347-358 (2010).
Summary: Exact string matching is a problem that computer programmers face on a regular basis, and full-text indexes like the suffix tree or the suffix array provide fast string search over large texts. In the last decade, research on compressed indexes has flourished because the main problem in large-scale applications is the space consumption of the index. Nowadays, the most successful compressed indexes are able to obtain almost optimal space and search time simultaneously. It is known that a myriad of sequence analysis and comparison problems can be solved efficiently with established data structures like the suffix tree or the suffix array, but algorithms on compressed indexes that solve these problem are still lacking at present. Here, we show that matching statistics and maximal exact matches between two strings $S ^{1}$ and $S ^{2}$ can be computed efficiently by matching $S ^{2}$ backwards against a compressed index of $S ^{1}$.