Consistent characters

A consistent collection of characters is one for which a binary tree $T$ can be constructed whose splits match those taken from the split notations of the characters.

The split notation is a way for a character to be encoded into a split $S_1 \mid S_1^{\textrm{c}}$, where $S_1$ and $S_1^{\textrm{c}}$ are the two disjoint sets of taxa that the character separates. On the other hand, removing an edge $e$ from a binary tree results in the creation of two subtrees, each containing a subset of the taxa, which also induces a split $S_2 \mid S_2^{\textrm{c}}$. Thus, we can label $e$ with the split $S_2 \mid S_2^{\textrm{c}}$, and so we have two ways of generating a split: from a character and from an edge of a binary tree.

A collection of characters is called consistent if a binary tree $T$ can be found whose edge splits do not conflict with the splits inferred from characters. Such a conflict will occur precisely when we have one split $S_1 \mid S_1^{\textrm{c}}$ inferred from a character, another split $S_2 \mid S_2^{\textrm{c}}$ inferred from an edge of $T$, and all four intersections $S_1 \cap S_2$, $S_1 \cap S_2^{\textrm{c}}$, $S_1^{\textrm{c}} \cap S_2$, $S_1^{\textrm{c}} \cap S_2^{\textrm{c}}$ are nonempty.

For a simple example, consider the conflicting quartets $\{a, b\} \mid \{c, d\}$ and $\{a, c\} \mid \{b, d\}$, which must correspond to two distinct trees on the four taxa $a$, $b$, $c$, and $d$.