A character is some feature, either physical or genetic, that divides a collection of
taxa into two groups. The ultimate goal is to apply characters to the
construction of a phylogeny, where taxa are represented as the leaves of a tree.
There are two common ways of encoding a given character C dividing a collection of n taxa.
C can be written in split notation as S∣Sc, where S is a subset
of our taxa and Sc is the set complement of S.
Removing an edge from a tree divides its leaves into two disjoint setsS and
Sc, so that we can establish a correspondence between characters and edges
of the phylogeny: specifically, we may assign each character to the edge that
its split notation implies.
The second notation for C assumes that we have ordered our n taxa,
after which C may be written in array notation as an array A in which A[i] is
equal to 1 or 0 depending on whether the ith taxon belongs to S or Sc.
Given a collection of arrays from a number of different characters, we may
combine the arrays into a matrix called a character table. The creation of
a phylogeny from a character table is an important algorithmic problem.