Hi. In this class,

we are going to introduce you to

the algorithm we have chosen to demonstrate

how it is possible to create

another implementation of a system based on

FPGA technologies by using

the exciting sexy excel design framework.

The algorithm we are going to work with has been chosen

among the one used in the computation biology field.

In such a context,

the development of computational biology increased

the complexity of the algorithms used by biologists.

The huge amount of data they need to

process and the complexity of these algorithms

raise the problem of increasing the amount of

computational power needed to perform the computation.

The main reason for this is that during

the years thanks to an increase in genome technologies,

a huge amount of genomic data has been collected.

This massive increase in available data pair with

the complexity of the algorithms that

needs to process these huge amount of data,

is the reason why also the computational power that is

necessary to efficiently perform

the computation is increased.

In such a scenario,

hardware accelerators revealed to

be efficient in achieving speedup in

the computation while at

the same time saving power consumption.

FPGA based system have proved

to be effective in optimizing the duration among

the performance and the power consumption

as these devices allow to

obtain a huge amount of

parallelism while keeping a relative low power profile.

Among the multiple tasks performed

in bioinformatics to analyze DNAs,

one of the most compute-intensive one is the task of

identifying regions of similarity

between sequences of DNA,

RNA or protein called sequence alignment.

A very famous algorithm that performs

sequence alignment is

called the Smith-Waterman algorithm.

The Smith-Waterman algorithm is

a dynamic programming algorithm guaranteed to find

the optimal local alignment between

two strings that could be nucleotides or proteins.

This algorithm has been firstly introduced in

1981 by Mr. Smith and Mr. Waterman.

It is based on a previously published an algorithm

called Needleman-Wunsch. It's quite interesting

to notice that one of the reasons

why we choose the Smith-Waterman algorithm is

also because this algorithm is not only widely used in

bioinformatics to identify similarity

of regions among genomes,

but it is also quite widely

used in multiple different fields.

As an example, we can consider

the computer science field generally speaking,

where it is used for

similar document retrieval or

near duplicate webpages detection

or even fingerprint matching.

If this is not enough,

we can also consider the computer security field.

Within this context, this algorithm can be

used when performing Deep Packet Inspection,

trying to find known threats to our systems.

The algorithm performs

local sequence alignment between two strings

usually referred as query

or read in the database sequences.

As previously mentioned, the Smith-Waterman is

a dynamic programming algorithm as it

decomposes a big problem into smaller ones,

stirring the intermediate results and using

them again when solving the next sub problem.

In our example, we can organize it into four macrophases;

read, compute, traceback and store or write.

Even a query N and the database M

is similar to system and in in a fine gap function,

the algorithm calculates the scoring matrix S and

the traceback matrix T whose dimension is N times Mh.

Finally, it traces back

the path of elements starting from the index,

identifying the maximum element in the similarity matrix

and terminating whenever a value

with value zero is found.

It is worth it to remember that

one of the most important features of

this algorithm is that it is guaranteed to

find the optimal local alignment between the two strings.

With respect to the scoring system,

it is provided in the input.

Because of the high computational needs

required by the algorithm,

it has been modified during the years and

multiple heuristic methodologies has been introduced.

These heuristics from the one hand

allow to increase the overall performance of the system,

but from the other hand

decrease the precision of the algorithm.

That does not guarantee anymore

the optimality of the alignment found.

Furthermore, they may not be able to find

the similarities for sequences that are very distant.

This is important and we do have

always to keep these in mind and by doing this,

try to find the right tradeoff

in-between algorithm precision and

system performance as it can be easily understood.

To avoid the loss in precision,

several solutions have been tried

over the years to achieve better performance,

but without losing in precision.

Because of this, in the state

of the art it is possible to find

multiple implementation of this algorithm using

general purpose CPU and

different hardware accelerators such as

graphical processing unit and FPGAs.

The performance of these algorithms are calculated by

means of giga cups or giga cell update per seconds.

This measure of performance can be

obtained multiplying the size of the query by

the size of the database and finally

the result can be divided for the total execution time.

At the end, what we are going to

present in this module is an analysis and

the successive FPGA based hardware acceleration of

the Smith-Waterman algorithm used to

perform pairwise alignment of DNA sequences.

Now, after having understood why which is

the Smith-Waterman algorithm from

a general contexts perspective,

let us now enter more into

the details of the specific characteristics of

this algorithm and on how it is going to be

efficiently implemented on an FPGA because of them.