The trie is helpful when processing multiple strings at once, but when we want to analyze
a single string, we need something different.
In this problem, we will use a new data structure to handle the problem of finding
long repeats in the genome. Recall from “Finding a Motif in DNA” that cataloguing these repeats
is a problem of the utmost interest to molecular biologists, as a natural correlation exists
between the frequency of a repeat and its influence on RNA transcription.
Our aim is therefore to identify long repeats that occur more than some predetermined number of times.
Figure 1. The suffix tree for s = GTCCGAAGCTCCGG. Note that the dollar sign has been appended to a substring of the tree to mark the end of s. Every path from the root to a leaf corresponds to a unique suffix of GTCCGAAGCTCCGG, and each leaf is labeled with the location in s of the suffix ending at that leaf.