Edit distance

The edit distance between two strings $s$ and $t$, denoted $d_{\textrm{E}}(s, t)$, is the minimum number of edit operations required to transform $s$ into $t$. Here, an edit operation is the insertion, deletion, or substitution of a single symbol.

Edit distance is excellent for quantifying the similarity of two relatively similar genetic strings that have only been separated by minor point mutations and the insertions/deletions of short intervals. It thus offers a more powerful generalization of Hamming distance, as it can consider two strings that do not have the same length.

The calculation of edit distance is an important algorithmic problem (see “Edit Distance”), and it requires us to somehow sift through all transformations of one string into the other by edit operations in order to find the minimum number of such operations possible. Using an alignment can be helpful.

Note that we can reverse any collection of edit operations transforming $s$ into $t$ to yield a transformation of $t$ into $s$ via edit operations (insertions become deletions and vice-versa), so that $d_{\textrm{E}}(s, t) = d_{\textrm{E}}(t, s)$.

Edit distance is also known as Levenshtein distance, named after Vladimir Levenshtein, who introduced the distance function in 1965.