July 13, 2012, midnight by Rosalind Team

**Topics**:
String Algorithms

## Motifs Are Rarely Contiguous

In “Finding a Motif in DNA”, we searched for occurrences of a motif as a substring of a larger database genetic string. However, because a DNA strand coding for a protein is often interspersed with introns (see “RNA Splicing”), we need a way to recognize a motif that has been chopped up into pieces along a chromosome.

A subsequence of a string is a collection of symbols contained in order (though not necessarily
contiguously) in the string (e.g., ACG is a subsequence of T*A*TG*C*TAA*G*ATC).
The indices of a subsequence are the positions in the string at which the symbols
of the subsequence appear; thus, the indices of ACG in TATGCTAAGATC can be represented by (2, 5, 9).

As a substring can have multiple locations, a subsequence can have multiple collections of indices, and the same index can be reused in more than one appearance of the subsequence; for example, ACG is a subsequence of AACCGGTT in 8 different ways.

Given: Two DNA strings

Return: One collection of indices of

>Rosalind_14 ACGTACGTGACG >Rosalind_18 GTA

3 8 10

## Extra Information

For the mathematically inclined, we may equivalently say that

$t = t_1 t_2 \cdots t_m$ is a subsequence of$s = s_1 s_2 \cdots s_n$ if the characters of$t$ appear in the same order within$s$ . Even more formally, a subsequence of$s$ is a string$s_{i_1} s_{i_2} \ldots s_{i_k}$ , where$1 \leq i_1 < i_2 \cdots < i_k \leq n$ .