Find a Highest-Scoring Overlap Alignment of Two Strings solved by 241

Feb. 16, 2014, 8:43 p.m. by Rosalind Team

When we assembled genomes, we discussed how to use overlapping reads to assemble a genome, a problem that was complicated by errors in reads. We would like to find overlaps between error-prone reads as well.

An overlap alignment of strings v = v1 ... vn and w = w1 ... wm is a global alignment of a suffix of v with a prefix of w. An optimal overlap alignment of strings v and w maximizes the global alignment score between an i-suffix of v and a j-prefix of w (i.e., between vi ... vn and w1 ... wj) among all i and j.

Overlap Alignment Problem

Construct a highest-scoring overlap alignment between two strings.

Given: Two protein strings v and w, each of length at most 1000.

Return: The score of an optimal overlap alignment of v and w, followed by an alignment of a suffix v’ of v and a prefix w’ of w achieving this maximum score. Use an alignment score in which matches count +1 and both the mismatch and indel penalties are 2. (If multiple overlap alignments achieving the maximum score exist, you may return any one.)

Sample Dataset

PAWHEAE
HEAGAWGHEE


Sample Output

1
HEAE
HEAG