Character-Based Phylogeny solved by 122

July 2, 2012, midnight by Rosalind Team

Topics: Phylogeny

Introduction to Character-Based Phylogeny

In “Creating a Character Table”, we discussed the construction of a character table from a collection of characters represented by subsets of our taxa. However, the ultimate goal is to be able to construct a phylogeny from this character table.

The issues at hand are that we want to ensure that we have enough characters to actually construct a phylogeny, and that our characters do not conflict with each other.

Problem

Because a tree having $n$ nodes has $n-1$ edges (see “Completing a Tree”), removing a single edge from a tree will produce two smaller, disjoint trees. Recall from “Creating a Character Table” that for this reason, each edge of an unrooted binary tree corresponds to a split $S \mid S^{\textrm{c}}$, where $S$ is a subset of the taxa.

A consistent character table is one whose characters' splits do not conflict with the edge splits of some unrooted binary tree $T$ on the $n$ taxa. More precisely, $S_1 \mid S_1^{\textrm{c}}$ conflicts with $S_2 \mid S_2^{\textrm{c}}$ if all four intersections $S_1 \cap S_2$, $S_1 \cap S_2^{\textrm{c}}$, $S_1^{\textrm{c}} \cap S_2$, and $S_1^{\textrm{c}} \cap S_2^{\textrm{c}}$ are nonempty. As a simple example, consider the conflicting splits $\{a, b\} \mid \{c, d\}$ and $\{a, c\} \mid \{b, d\}$.

More generally, given a consistent character table $C$, an unrooted binary tree $T$ "models" $C$ if the edge splits of $T$ agree with the splits induced from the characters of $C$.

Given: A list of $n$ species ($n \leq 80$) and an $n$-column character table $C$ in which the $j$th column denotes the $j$th species.

Return: An unrooted binary tree in Newick format that models $C$.

Sample Dataset

cat dog elephant mouse rabbit rat
011101
001101
001100


Sample Output

(dog,(cat,rabbit),(rat,(elephant,mouse)));