Completing a Tree solved by 2028

July 11, 2012, midnight by Rosalind Team

Topics: Graph Algorithms, Phylogeny

The Tree of Life

Figure 1. A phylogeny illustrating proposed evolutionary relationships among the three domains of life: Bacteria, Archaea, and Eukaryota.

"As buds give rise by growth to fresh buds, and these, if vigorous, branch out and overtop on all sides many a feebler branch, so by generation I believe it has been with the great Tree of Life, which fills with its dead and broken branches the crust of the earth, and covers the surface with its ever-branching and beautiful ramifications." Charles Darwin, The Origin of Species

A century and a half has passed since the publication of Darwin's magnum opus, and yet the construction of a single Tree of Life uniting life on Earth still has not been completed, with perhaps as many as 90% of all living species not yet catalogued (although a beautiful interactive animation has been produced by OneZoom).

To learn about state-of-the-art attempts to build the Tree of Life, you may like to check out the Tree of Life Web Project, a collaborative effort by biologists from around the world. This peer-reviewed project started in 1995, and today it has grown to contain more than 10,000 pages with characteristics of different groups of organisms and their evolutionary history.

Instead of trying to construct the entire Tree of Life all at once, we often wish to form a simpler tree in which a collection of species have been clumped together for the sake of simplicity; such a group is called a taxon (pl. taxa). For a given collection of taxa, a phylogeny is a treelike diagram that best represents the evolutionary connections between the taxa: the construction of a particular phylogeny depends on our specific assumptions regarding how these evolutionary relationships should be interpreted. See Figure 1.


Figure 2. A labeled tree with 6 vertices and 5 edges.

An undirected graph is connected if there is a path connecting any two nodes. A tree is a connected (undirected) graph containing no cycles; this definition forces the tree to have a branching structure organized around a central core of nodes, just like its living counterpart. See Figure 2.

We have already grown familiar with trees in “Mendel's First Law”, where we introduced the probability tree diagram to visualize the outcomes of a random variable.

In the creation of a phylogeny, taxa are encoded by the tree's leaves, or nodes having degree 1. A node of a tree having degree larger than 1 is called an internal node.

Given: A positive integer $n$ ($n \leq 1000$) and an adjacency list corresponding to a graph on $n$ nodes that contains no cycles.

Return: The minimum number of edges that can be added to the graph to produce a tree.

Sample Dataset

1 2
2 8
4 10
5 9
6 10
7 9

Sample Output


Extra Information

After solving this problem, a standard mathematical exercise for the technically minded is to verify that every tree having 2 or more nodes must contain at least two leaves.

Please login to solve this problem.