Processing math: 100%

Glossary

k-mer composition

The k-mer composition of a string s encodes the number of times that each possible k-mer occurs in s. To represent the k-mer composition of a string concisely, all possible k-mers (in the case of DNA strings, there will be 4k total k-mers) are ordered lexicographically, and then an array A is created in which A[i] represents the number of times that the ith of these ordered k-mers appears in s.

The k-mer composition is a generalization of GC-content to the case of substrings. In the figure below, we show the array giving the 2-mer composition of "TTGATTACCTTATTTGATCATTACACATTGTACGCTTGTGTCAAAATATCACATGTGCCT".

2-mer Composition