ROSALIND | Glossary | k-mer composition

The $k$ -mer composition of a string $s$ encodes the number of times that each possible k-mer occurs in $s$ . To represent the k-mer composition of a string concisely, all possible k-mers (in the case of DNA strings, there will be $4^k$ total $k$ -mers) are ordered lexicographically, and then an array $A$ is created in which $A[i]$ represents the number of times that the $i$ th of these ordered $k$ -mers appears in $s$ .

The $k$ -mer composition is a generalization of GC-content to the case of substrings. In the figure below, we show the array giving the 2-mer composition of "TTGATTACCTTATTTGATCATTACACATTGTACGCTTGTGTCAAAATATCACATGTGCCT".

2-mer Composition

Glossary

k-mer composition

Report a typo

Flag as inappropriate

Welcome to Rosalind!