At the end of the last lecture, we faced the challenge of finding positions of the pattern in text when they tried to develop pattern matching with Burrows Wheeler Transform. Now I will explain how we will use suffix arrays for solving this problem. Suffix array simply holds starting positions of each suffix. For example, the first suffix start at position 13, the second suffix start at position 5, the next suffix starts at position 3 and we can fill up the end of the suffix array. Here it is. So, when suffix array is constructed, we can very quickly answer the question where the occurrences of the part are not. In this case, and the case of ana, our pattern appear at position 1, 7 and 9. The challenge, however, is how to construct the suffix array quickly. Because the naive algorithm for constructing suffix array that is based on sorting all suffixes of text requires all of length of text logarithm length of text comparison. And each comparison also may take time. There is a way to construct a suffix array if you're already constructing a suffix tree. Because, as you can see from this example, a suffix array is simply a depth-first reversal of the suffix tree. Indeed, you start from lead 5, continue to lead 3. One, seven, nine, continue further and by simply traversing all leaves in the suffix tree in order to reconstruct the suffix array. To summarize, if the construct suffix array by the depth-first traversal of the suffix tree, then it takes all of text time and roughly 20 times text space. And Misha will also explain later how to quickly construct suffix array without relying on suffix tree. In fact, Manber-Myers, in 1990, constructed the first algorithm for the first linear timeout for suffix array that require O of text time four time text space. However, for genomics application, even this reduced space four times length of the text, is still large. What can we do to reduce it? Here is a trick for reducing the memory for suffix array. I will first ask a question. Can we store? Only a fraction of the suffix array, but still do fast pattern matching. For example, can we store only elements of the suffix array that are multiples of some integer K. Shown here, if integer K is equal to five, then we only store elements five 10 and 0 of the suffix array. We have no access to other members of suffix array. How it can be useful? Let me show you how to use the partial suffix array to find the position of matches. When we have complete suffix array it is trivial, we simply look up at the highest point elements in the suffix array. But what do we do when we only have partial suffix array? Where are these ana occurrences appear in the text? Well, we do not know, because there is no corresponding k element, let's say, for a2na. How do we find where it appears? Indeed, we don't know yet where it appears, but we can also ask a question where is b1 a n a appears. Once again, we don't know how to answer this question because there is corresponding [INAUDIBLE] in the partial suffix. But, using the first last part of it we can ask different question. Where is the string a 1 b ana appears? And now we can answer this question. Because in this case, the element of the partial suffix array is Present. It is fast. So, we know where there is a1bana appears, now the thing left is to figure out where ana appears. And it's easy to do because if aabana appears at position 5, then b1ana appears at position 6. And ana appears at position 7, so we figure out how to use suffix array for fast pattern matching. Of course, the time to search for pattern will be multiplied by a factor of k, because if you store, we potentially can search for up to k position before you find the fill element of the partial suffix array, but K is a constant in this algorithm.