Inferring Peptide from Full Spectrum solved by 758

July 25, 2012, midnight by Sonya Alexandrova

Topics: Computational Mass Spectrometry

Ions Galore

In “Inferring Protein from Spectrum”, we inferred a protein string from a list of b-ions. In practice, biologists have no way of distinguishing between b-ions and y-ions in the simplified spectrum of a peptide. However, we will often possess a pair of masses in the spectrum corresponding to a single cut. The two corresponding ions complement each other: for example, mass("PR") + mass("TEIN") = mass("PRTEIN"). As a result, we can easily infer the mass of a b-ion from its complementary y-ion and vice versa, as long as we already know the parent mass, i.e., the mass of the entire peptide.

The theoretical simplified spectrum for a protein $P$ of length $n$ is constructed as follows: form all possible cuts, then compute the mass of the b-ion and the y-ion at each cut. Duplicate masses are allowed. You might guess how we could modify “Inferring Protein from Spectrum” to infer a peptide from its theoretical simplified spectrum; here we consider a slightly modified form of this problem in which we attempt to identify the interior region of a peptide given only b-ions and y-ions that are cut within this region. As a result, we will have constant masses at the beginning and end of the peptide that will be present in the mass of every b-ion and y-ion, respectively.


Say that we have a string $s$ containing $t$ as an internal substring, so that there exist nonempty substrings $s_1$ and $s_2$ of $s$ such that $s$ can be written as $s_1 t s_2$. A t-prefix contains all of $s_1$ and none of $s_2$; likewise, a t-suffix contains all of $s_2$ and none of $s_1$.

Given: A list $L$ containing $2n + 3$ positive real numbers ($n \leq 100$). The first number in $L$ is the parent mass of a peptide $P$, and all other numbers represent the masses of some b-ions and y-ions of $P$ (in no particular order). You may assume that if the mass of a b-ion is present, then so is that of its complementary y-ion, and vice-versa.

Return: A protein string $t$ of length $n$ for which there exist two positive real numbers $w_1$ and $w_2$ such that for every prefix $p$ and suffix $s$ of $t$, each of $w(p) + w_1$ and $w(s) + w_2$ is equal to an element of $L$. (In other words, there exists a protein string whose $t$-prefix and $t$-suffix weights correspond to the non-parent mass values of $L$.) If multiple solutions exist, you may output any one.

Sample Dataset


Sample Output


Please login to solve this problem.