ROSALIND | Using the Spectrum Graph to Infer Peptides

Getting Real with Spectra

In “Inferring Peptide from Full Spectrum”, we considered an idealized version of the simplified spectrum in which every cut through a given peptide was produced, so that the spectrum possessed all possible b-ions and y-ions cutting the peptide. In reality, not every cut will be produced in a spectrum, which may also contain errors. As a result, it is difficult or impossible to recover an entire peptide from a single spectrum.

In the more practical case of a mass spectrum, where intensity is plotted against ions' mass-charge ratios, inferring the protein is also greatly complicated by the presence of erratic peaks in the spectrum.

Problem

For a weighted alphabet $\mathscr{A}$ and a collection $L$ of positive real numbers, the spectrum graph of $L$ is a digraph constructed in the following way. First, create a node for every real number in $L$. Then, connect a pair of nodes with a directed edge $(u, v)$ if $v > u$ and $v - u$ is equal to the weight of a single symbol in $\mathscr{A}$. We may then label the edge with this symbol.

In this problem, we say that a weighted string $s = s_1 s_2 \cdots s_n$ matches $L$ if there is some increasing sequence of positive real numbers $(w_1, w_2, \ldots, w_{n+1})$ in $L$ such that $w(s_1) = w_2 - w_1$, $w(s_2) = w_3 - w_2$, ..., and $w(s_n) = w_{n+1} - w_{n}$.

Given: A list $L$ (of length at most 100) containing positive real numbers.

Return: The longest protein string that matches the spectrum graph of $L$ (if multiple solutions exist, you may output any one of them). Consult the monoisotopic mass table.

Sample Dataset

Sample Output

WMSPG

Hint

How can our question be rephrased in terms of the spectrum graph?

Using the Spectrum Graph to Infer Peptides solved by 510

Getting Real with Spectra

Problem

Sample Dataset

Sample Output

Hint

Report a typo