Using the Spectrum Graph to Infer Peptides solved by 535

July 27, 2012, midnight by Sonya Alexandrova

Topics: Computational Mass Spectrometry, Graph Algorithms

Getting Real with Spectra

In “Inferring Peptide from Full Spectrum”, we considered an idealized version of the simplified spectrum in which every cut through a given peptide was produced, so that the spectrum possessed all possible b-ions and y-ions cutting the peptide. In reality, not every cut will be produced in a spectrum, which may also contain errors. As a result, it is difficult or impossible to recover an entire peptide from a single spectrum.

In the more practical case of a mass spectrum, where intensity is plotted against ions' mass-charge ratios, inferring the protein is also greatly complicated by the presence of erratic peaks in the spectrum.

Problem

For a weighted alphabet $\mathscr{A}$ and a collection $L$ of positive real numbers, the spectrum graph of $L$ is a digraph constructed in the following way. First, create a node for every real number in $L$. Then, connect a pair of nodes with a directed edge $(u, v)$ if $v > u$ and $v - u$ is equal to the weight of a single symbol in $\mathscr{A}$. We may then label the edge with this symbol.

In this problem, we say that a weighted string $s = s_1 s_2 \cdots s_n$ matches $L$ if there is some increasing sequence of positive real numbers $(w_1, w_2, \ldots, w_{n+1})$ in $L$ such that $w(s_1) = w_2 - w_1$, $w(s_2) = w_3 - w_2$, ..., and $w(s_n) = w_{n+1} - w_{n}$.

Given: A list $L$ (of length at most 100) containing positive real numbers.

Return: The longest protein string that matches the spectrum graph of $L$ (if multiple solutions exist, you may output any one of them). Consult the monoisotopic mass table.

Sample Dataset

3524.8542
3623.5245
3710.9335
3841.974
3929.00603
3970.0326
4026.05879
4057.0646
4083.08025

Sample Output

WMSPG

Hint

How can our question be rephrased in terms of the spectrum graph?

Please login to solve this problem.