March 18, 2014, 8:26 p.m. by Rosalind Team

Our goal is to further improve on the memory requirements of the suffix array introduced in “Construct the Suffix Array of a String” for multiple pattern matching.

Given a string *Text*, form all possible **cyclic rotations** of *Text*; a cyclic rotation is defined by chopping off a suffix from the end of *Text* and appending this suffix to the beginning of *Text*. Next — similarly to suffix arrays — order all the cyclic rotations of *Text* lexicographically to form a |*Text*| × |*Text*| matrix of symbols that we call the **Burrows-Wheeler matrix** and denote by *M*(*Text*).

Note that the first column of *M*(*Text*) contains the symbols of *Text* ordered lexicographically. In turn, the second column of *M*(*Text*) contains the second symbols of all cyclic rotations of *Text*, and so it too represents a (different) rearrangement of symbols from *Text*. The same reasoning applies to show that any column of *M*(*Text*) is some rearrangement of the symbols of *Text*. We are interested in the last column of *M*(*Text*), called the **Burrows-Wheeler transform** of *Text*, or *BWT*(*Text*).

*Construct the Burrows-Wheeler transform of a string.*

Given: A string *Text*.

Return: *BWT*(*Text*).

GCGTGCCTGGTCA$

ACTGGCT$TGCGGC

## Note

Although it is possible to construct the Burrows-Wheeler transform in O(|

Text|) time and space, we do not expect you to implement such a fast algorithm. In other words, it is perfectly fine to produce BWT(Text) by first producing the complete Burrows-Wheeler matrix M(Text).