July 29, 2015, 1:09 a.m. by Rosalind Team
Since increasing read length presents a difficult experimental problem, biologists have suggested an indirect way of increasing read length by generating read-pairs, which are pairs of reads separated by a fixed distance d in the genome.
You can think about a read-pair as a long "gapped" read of length k + d + k whose first and last k-mers are known but whose middle segment of length d is unknown. Nevertheless, read-pairs contain more information than k-mers alone, and so we should be able to use them to improve our assemblies. If only you could infer the nucleotides in the middle segment of a read-pair, you would immediately increase the read length from k to 2 · k + d.
Given a string Text, a (k,d)-mer is a pair of k-mers in Text separated by distance d. We use the notation (Pattern1|Pattern2) to refer to a a (k,d)-mer whose k-mers are Pattern1 and Pattern2. The (k,d)-mer composition of Text, denoted PairedCompositionk,d(Text), is the collection of all (k,d)- mers in Text (including repeated (k,d)-mers).
Reconstruct a string from its paired composition.
Given: Integers k and d followed by a collection of paired k-mers PairedReads.
Return: A string Text with (k, d)-mer composition equal to PairedReads. (If multiple answers exist, you may return any one.)
4 2 GAGA|TTGA TCGT|GATG CGTG|ATGT TGGT|TGAG GTGA|TGTT GTGG|GTGA TGAG|GTTG GGTC|GAGA GTCG|AGAT
GTGGTCGTGAGATGTTGA