Processing math: 100%

Encoding Suffix Trees solved by 427

Aug. 23, 2012, midnight by btjaden

Topics: Graph Algorithms, String Algorithms

Creating a Suffix Treeclick to expand

In “Finding the Longest Multiple Repeat”, we introduced the suffix tree. This data structure has a wide array of applications, one of which was to help us identify long repeats in a genome. In that problem, we provided the tree as part of the dataset, but a vital computational exercise is to create the suffix tree solely from a string.

Problem

Figure 1. The suffix tree for s = GTCCGAAGCTCCGG. Note that the dollar sign has been appended to a substring of the tree to mark the end of s. Every path from the root to a leaf corresponds to a unique suffix of GTCCGAAGCTCCGG, and each leaf is labeled with the location in s of the suffix ending at that leaf.

Given a string s having length n, recall that its suffix tree T(s) is defined by the following properties:

Figure 1 contains an example of a suffix tree.

Given: A DNA string s of length at most 1kbp.

Return: The substrings of s encoding the edges of the suffix tree for s. You may list these substrings in any order.

Sample Dataset

ATAAATG$

Sample Output

AAATG$
G$
T
ATG$
TG$
A
A
AAATG$
G$
T
G$
$

Please login to solve this problem.

Welcome to Rosalind!

Rosalind is a platform for learning bioinformatics through problem solving.
Please login with Google/Twitter/Facebook or register a new account.