ROSALIND | Enumerating k-mers Lexicographically

Organizing Strings

When cataloguing a collection of genetic strings, we should have an established system by which to organize them. The standard method is to organize strings as they would appear in a dictionary, so that "APPLE" precedes "APRON", which in turn comes before "ARMOR".

Problem

Assume that an alphabet $\mathscr{A}$ has a predetermined order; that is, we write the alphabet as a permutation $\mathscr{A} = (a_1, a_2, \ldots, a_k)$, where $a_1 < a_2 < \cdots < a_k$. For instance, the English alphabet is organized as $(\textrm{A}, \textrm{B}, \ldots, \textrm{Z})$.

Given two strings $s$ and $t$ having the same length $n$, we say that $s$ precedes $t$ in the lexicographic order (and write $s <_{\textrm{Lex}} t$) if the first symbol $s[j]$ that doesn't match $t[j]$ satisfies $s_j < t_j$ in $\mathscr{A}$.

Given: A collection of at most 10 symbols defining an ordered alphabet, and a positive integer $n$ ($n \leq 10$).

Return: All strings of length $n$ that can be formed from the alphabet, ordered lexicographically (use the standard order of symbols in the English alphabet).

Sample Dataset

A C G T
2

Sample Output

AA
AC
AG
AT
CA
CC
CG
CT
GA
GC
GG
GT
TA
TC
TG
TT

Enumerating k-mers Lexicographically solved by 7152

Organizing Strings

Problem

Sample Dataset

Sample Output

Report a typo