ROSALIND | Error Correction in Reads

Genome Sequencing Isn't Perfectclick to expand

In “Genome Assembly as Shortest Superstring”, we introduce the problem of assembling a genome from a collection of reads. Even though genome sequencing is a multi-billion dollar enterprise, sequencing machines that identify reads still produce errors a substantial percentage of the time. To make matters worse, these errors are unpredictable; it is difficult to determine if the machine has made an error, let alone where in the read the error has occurred. For this reason, error correction in reads is typically a vital first step in genome assembly.

Problem

As is the case with point mutations, the most common type of sequencing error occurs when a single nucleotide from a read is interpreted incorrectly.

Given: A collection of up to 1000 reads of equal length (at most 50 bp) in FASTA format. Some of these reads were generated with a single-nucleotide error. For each read $s$ in the dataset, one of the following applies:

$s$ was correctly sequenced and appears in the dataset at least twice (possibly as a reverse complement);
$s$ is incorrect, it appears in the dataset exactly once, and its Hamming distance is 1 with respect to exactly one correct read in the dataset (or its reverse complement).

Return: A list of all corrections in the form "[old read]->[new read]". (Each correction must be a single symbol substitution, and you may return the corrections in any order.)

Sample Dataset

>Rosalind_52
TCATC
>Rosalind_44
TTCAT
>Rosalind_68
TCATC
>Rosalind_28
TGAAA
>Rosalind_95
GAGGA
>Rosalind_66
TTTCA
>Rosalind_33
ATCAA
>Rosalind_21
TTGAT
>Rosalind_18
TTTCC

Sample Output

TTCAT->TTGAT
GAGGA->GATGA
TTTCC->TTTCA

Error Correction in Reads solved by 2351

Genome Sequencing Isn't Perfectclick to expand

Problem

Sample Dataset

Sample Output

Welcome to Rosalind!

Error Correction in Reads solved by 2351

Genome Sequencing Isn't Perfectclick to expand

Problem

Sample Dataset

Sample Output

Report a typo

Flag as inappropriate

Welcome to Rosalind!