# Glossary

## PIR

PIR stands for the "Protein Information Resource", which is offered by Georgetown University and can be found here.

A sequence in PIR format consists of the following, in order:

• One line beginning with a ">" (greater-than) sign, followed by a two-letter code describing the sequence type (P1, F1, DL, DC, RL, RC, or XX), followed by a semicolon, followed by the sequence identification code (the database ID-code).
• One line containing a text description of the sequence.
• Lines containing the sequence itself. The end of the sequence is marked by a "*" (asterisk) symbol.
• (Optional) Lines describing the sequence. Software that is supposed to read only the sequence should ignore these supplementary lines.

A file in PIR format may comprise more than one sequence. PIR format is also commonly known as NBRF format.

Below is an example of a file in PIR format containing two sequences:



P1;CRAB_ANAPL
ALPHA CRYSTALLIN B CHAIN (ALPHA(B)-CRYSTALLIN).
MDITIHNPLI RRPLFSWLAP SRIFDQIFGE HLQESELLPA SPSLSPFLMR
SPIFRMPSWL ETGLSEMRLE KDKFSVNLDV KHFSPEELKV KVLGDMVEIH