Base Filtration by Quality solved by 654

Feb. 21, 2014, 5:41 p.m. by Rosalind Team

Topics: Bioinformatics Tools, NGS

Cannot milk it?

Once you detect the bad quality letters in your reads, a good strategy is not to filter out entire reads (as we do in “Read Filtration by Quality”) but trim low-quality bases from both ends of your reads. it helps to save a large amount of information.

Problem

Bad quality bases can be easily trimmed out using certain threshold (defined by quality plot similar to what we did in “Base Quality Distribution”) There is a lot of trimming tools, you can try one of following:

Given: FASTQ file, quality cut-off value $q$, Phred33 quality score assumed.

Return: FASTQ file trimmed from the both ends (removed leading and trailing bases with quality lower than $q$)

Sample Dataset

20
@Rosalind_0049
GCAGAGACCAGTAGATGTGTTTGCGGACGGTCGGGCTCCATGTGACACAG
+
FD@@;C<AI?4BA:=>C<G=:AE=><A??>764A8B797@A:58:527+,
@Rosalind_0049
AATGGGGGGGGGAGACAAAATACGGCTAAGGCAGGGGTCCTTGATGTCAT
+
1<<65:793967<4:92568-34:.>1;2752)24')*15;1,.3*3+*!
@Rosalind_0049
ACCCCATACGGCGAGCGTCAGCATCTGATATCCTCTTTCAATCCTAGCTA
+
B:EI>JDB5=>DA?E6B@@CA?C;=;@@C:6D:3=@49;@87;::;;?8+

Sample Output

@Rosalind_0049
GCAGAGACCAGTAGATGTGTTTGCGGACGGTCGGGCTCCATGTGACAC
+
FD@@;C<AI?4BA:=>C<G=:AE=><A??>764A8B797@A:58:527
@Rosalind_0049
ATGGGGGGGGGAGACAAAATACGGCTAAGGCAGGGGTCCT
+
<<65:793967<4:92568-34:.>1;2752)24')*15;
@Rosalind_0049
ACCCCATACGGCGAGCGTCAGCATCTGATATCCTCTTTCAATCCTAGCT
+
B:EI>JDB5=>DA?E6B@@CA?C;=;@@C:6D:3=@49;@87;::;;?8

Please login to solve this problem.