July 2, 2012, midnight by Rosalind Team
Topics: Dynamic Programming, String Algorithms
Locating Motifs Despite Introns
In “Finding a Shared Motif”, we discussed searching through a database containing multiple genetic strings to find a longest common substring of these strings, which served as a motif shared by the two strings. However, as we saw in “RNA Splicing”, coding regions of DNA are often interspersed by introns that do not code for proteins.
We therefore need to locate shared motifs that are separated across exons, which means that the motifs are not required to be contiguous. To model this situation, we need to enlist subsequences.
Analogously to the definition of longest common substring,
Given: Two DNA strings
Return: A longest common subsequence of
>Rosalind_23 AACCTTGG >Rosalind_64 ACACTGTGA