Recall that although chromosomes taken from eukaryotes have a linear structure,
many bacterial chromosomes are actually circular. We represented a linear chromosome with
a DNA string, so we only need to modify the definition of string to model circular
chromosomes.
Perfect coverage is the phenomenon in fragment assembly of having a read (or k-mer)
begin at every possible location in the genome. Unfortunately, perfect coverage
is still difficult to achieve, but fragment assembly technology continues to improve by leaps
and bounds, and perfect coverage is perhaps not the fantasy it once was.
Problem
A circular string is a string that does not have an initial or terminal element;
instead, the string is viewed as a necklace of symbols. We can represent a circular
string as a string enclosed in parentheses. For example, consider the circular DNA string (ACGTAC), and
note that because the string "wraps around" at the end, this circular string can equally be represented by
(CGTACA), (GTACAC), (TACACG), (ACACGT), and (CACGTA). The definitions of substrings and
superstrings are easy to generalize to the case of circular strings (keeping in mind that substrings are
allowed to wrap around).
Given: A collection of (error-free) DNAk-mers (k≤50)
taken from the same strand of a circular chromosome. In this dataset, all k-mers
from this strand of the chromosome are present, and their de Bruijn graph consists of exactly one simple cycle.
Return: A cyclic superstring of minimal length containing the reads (thus corresponding
to a candidate cyclic chromosome).
Sample Dataset
ATTAC
TACAG
GATTA
ACAGA
CAGAT
TTACA
AGATT
Sample Output
GATTACA
Noteclick to expand
The assumption made above that all reads derive from
the same strand is practically unrealistic; in reality, researchers will not
know the strand of DNA from which a given read has been sequenced.