Sept. 18, 2015, 2:49 a.m. by Rosalind Team

Imagine that we have an algorithm generating the set of all peptides scoring at least *threshold* against a spectral vector *Spectrum'*. We will henceforth call this set of high-scoring peptides a **spectral dictionary**, denoted *Dictionary*_{threshold}(*Spectrum'*). For a PSM (*Peptide*, *Spectrum'*), we will use the term **PSM dictionary**, denoted *Dictionary*(*Peptide*, *Spectrum'*), to refer to the spectral dictionary *Dictionary*_{Score(Peptide, Spectrum')}(*Spectrum*).

Say we have a spectral dictionary consisting of a single amino acid string *Peptide*, which we will attempt to match against a randomly generated string *DecoyProteome* of length *n*.* *Because *DecoyProteome* was randomly generated, the probability that *Peptide* matches the string beginning at a given position of *DecoyProteome* is 1/20^{|Peptide|}. We call this expression the **probability** of *Peptide*.

If the spectral dictionary consists of more than one peptide, then we define the **probability **of the dictionary as

We will first compute the number of peptides in a spectral dictionary, since this simpler problem will provide insights on how to compute the probability of a spectral dictionary.

We will use dynamic programming to find the size of a spectral Dictionary problem. Given a spectral vector *Spectrum'* = (*s*_{1}, . . . , *s*_{m}), we define its **i****-prefix** (for *i* between 1 and *m*) as *Spectrum'*_{i} = (*s*_{1}, . . . , *s*_{i }) and introduce a variable *Size*(*i*, *t*) as the number of peptides *Peptide* of mass *i* such that *Score*(*Peptide*, *Spectrum'*_{i}) is equal to *t*.

The key to establishing a recurrence relation for computing *Size*(*i*, *t*) is to realize that the set of peptides contributing to *Size*(*i*, *t*) can be split into 20 subsets depending on their final amino acid *a*. Each peptide ending in a specific amino acid *a* results in a shorter peptide with mass *i* − |*a*| and score *t *− *s*_{i }if we remove a from the peptide (here, |a| denotes the mass of a). Thus,

Since there is a single “empty” peptide of length zero, we initialize *Size*(0, 0) = 1. We also define *Size*(0, *t*) = 0 for all possible scores *t*, and set *Size*(*i*, *t*) = 0 for negative values of i. Using the above recurrence, we can compute the size of a spectral dictionary of *Spectrum'* = (*s*_{1}, . . . , *s*_{m}) as

*Find the size of the spectral dictionary for a given spectrum and score threshold.*

Given: A spectral vector *Spectrum'*, an integer *threshold*, and an integer *max_score*.

Return: The size of the dictionary *Dictionary*_{threshold}(*Spectrum'*).

**Note:** Use the provided *max_score* for the height of your table. Your answer should be the number of peptides whose score is at least *threshold* and at most *max_score*.

**Note:** In this chapter, all dataset problems implicitly use the standard integer-valued mass table for the regular twenty amino acids. Examples sometimes use imaginary amino acids X and Z having respective integer masses 4 and 5.

4 -3 -2 3 3 -4 5 -3 -1 -1 3 4 1 3 1 8

3