Sept. 9, 2012, midnight by Aleksey Kladov
Reads, obtained from sequencing process, are always error-prone. Error-correction is an important step in assembly pipeline, for both de-Bruijn and overlap graphs approaches. One simple way of determining if the k-mer is correct is to count it occurrences among the set of all reads. K-mers with high count tend to be more reliable.
You are given an integer k and a set of reads. Your task is to provide a set of all k-mers, that can be obtained from the reads, and for each k-mer it's count is also needed. Reads come from both strands and can be of various length. You for each k-mer, existing in reads, you can provide any of it complement versions? but each k-mer must present in output at most once.