Jan. 25, 2013, 7:58 p.m. by mariel.21
Finding Mutated Motifs
We have discussed at length the importance of motif finding in biology for genetic strings. However, searching for exact substring matches is of little use in applications because a motif can vary under the effect of mutation. Fortunately, we already possess functions like edit distance for quantifying the similarity of two strings.
Furthermore, recall that each chromosome is made up of a large number of genes (on average, each human chromosome contains over 1,000 genes). Therefore, to determine whether a newly sequenced chromosome contains a given gene, neither local nor global alignment applies.
One possible alignment variant for finding genes is semiglobal alignment, which we discuss in “Semiglobal Alignment”; yet semiglobal alignment only allows us to disregard gaps at the end of the alignment. To find a known gene in a new chromosome, we need to instead align the gene against intervals of the chromosome, a problem that calls for an entirely new algorithmic variation of alignment.
Given a string
Note that more than one such substring of
Return: An optimal fitting alignment score with respect to the mismatch score defined above,
followed by an optimal fitting alignment of a substring of
>Rosalind_54 GCAAACCATAAGCCCTACGTGCCGCCTGTTTAAACTCGCGAACTGAATCTTCTGCTTCACGGTGAAAGTACCACAATGGTATCACACCCCAAGGAAAC >Rosalind_46 GCCGTCAGGCTGGTGTCCG
5 ACCATAAGCCCTACGTG-CCG GCCGTCAGGC-TG-GTGTCCG
This problem follows Jones & Pevzner, An Introduction to Bioinformatics Algorithms, Problem 6.23.