ROSALIND | Glossary | Lexicographic order

Lexicographic order is a way of ordering all of the strings constructed from an alphabet. To order strings in this way, we first assume that the alphabet $\mathscr{A}$ has an underlying order of its symbols, so that it may be encoded as a permutation. For example, the English alphabet may be represented by the permutation $(\textrm{A}, \textrm{B}, \textrm{C} \ldots, \textrm{Z})$.

If two strings $s$ and $t$ have the same length $n$, then we say that $s$ precedes $t$ lexicographically (and write $s <_{\textrm{Lex}} t$) if the first nonmatching symbols $s[j]$ and $t[j]$ satisfy $s_j < t_j$ in $\mathscr{A}$.

For example, consider the strings $s = \textrm{"APPLE"}$ and $t = \textrm{"APRON"}$. We first find a mismatch at the third symbol of these strings, where the 'P' of $s$ mismatches the 'R' of $t$. Because 'P' precedes 'R' in the English alphabet, we say that $s <_{\textrm{Lex}} t$.

We can also compare two strings if they do not have the same length. For example, say that $s$ and $t$ have lengths $m$ and $n$, respectively, and $m < n$. Simply "chop off" the final $m - n$ symbols from $t$ to form the string $t' = t[1:m]$, and then compare $s$ to $t'$. We have three possibilities:

If $s = t'$, then we immediately conclude that $t$ should come after $s$ (example: $\textrm{"APPLE"} <_{\textrm{Lex}} \textrm{"APPLET"}$).
If $s <_{\textrm{Lex}} t'$, then $t$ should come after $s$ (example: $\textrm{"APPLE"} <_{\textrm{Lex}} \textrm{"ARTIFACTS"}$ because $\textrm{"APPLE"} <_{\textrm{Lex}} \textrm{"ARTIF"}$).
If $s >_{\textrm{Lex}} t'$, then $s$ should come after $t$ (example: $\textrm{"APPLE"} >_{\textrm{Lex}} \textrm{"APHIDS"}$ because $\textrm{"APPLE"} >_{\textrm{Lex}} \textrm{"APHID"}$).

Glossary

Lexicographic order

Report a typo