The emerging Grid technology is becoming an important aspect for the solution of both computer intensive and data intensive problems. The computational grids enables the use of a large number of different machines acting as a single one, by sharing both storage capacity and computing power. The importance of sharing data and resources in a secure manner is proved by the increasing interest of scientist towards this technology, especially in the biomedical community, that includes bioinformatics. In particular in proteomics a relevant area deals with protein structures comparison. Comparing protein structures is important for protein classification and for understanding the protein functions. We have developed a method, PROuST,ᅠ that allows efficient retrieval of similarity information from a database containing all protein structures of the Protein Data Bank (PDB).
In this talk I will first present the general Grid architecture and the method PROuST for protein structures comparison. Then I will focus on optimization strategies adopted to port PROuST onto a real grid, and on PROuST performances on it. More in details PROuST consists of different components: an index-based search that produces a list of proteins that are good candidates for similarity, and a dynamic programming algorithm that aligns the target protein with each candidate protein. Since both components use the same geometric data stored in large hash tables an important issue arises when porting the application on a Grid, i.e. the tradeoff between data transfer and data recomputation. Replica optimization also is a crucial aspect of a gridifying strategy. Using the pool of services provided by the European DataGrid we experimented with two main policies for replica management: On-line Replica and Off-line Replica. In the last part of my talk I will present the results about the efficiency measurements and reliability.