Aug. 23, 2012, midnight by btjaden

**Topics**:
Graph Algorithms,
String Algorithms

## Creating a Suffix Tree

In “Finding the Longest Multiple Repeat”, we introduced the suffix tree. This data structure has a wide array of applications, one of which was to help us identify long repeats in a genome. In that problem, we provided the tree as part of the dataset, but a vital computational exercise is to create the suffix tree solely from a string.

Given a string

$T(s)$ is a rooted tree having exactly$n$ leaves.- Every edge of
$T(s)$ is labeled with a substring of$s^*$ , where$s^*$ is the string formed by adding a placeholder symbol`$`

to the end of$s$ . - Every internal node of
$T(s)$ other than the root has at least two children; i.e., it has degree at least 3. - The substring labels for the edges leading down from a node to its children must begin with different symbols.
- By concatenating the substrings along edges, each path from the root to a leaf corresponds to a unique suffix of
$s^*$ .

Figure 1 contains an example of a suffix tree.

Given: A DNA string

Return: The substrings of

ATAAATG$

AAATG$ G$ T ATG$ TG$ A A AAATG$ G$ T G$ $