We have often collaborated with the Edinburgh Parallel Computing Centre (EPCC) to deliver applications requiring both biological and HPC specialist knowledge.
Home of the largest supercomputing centre in Europe, EPCC employs 75 highly skilled staff and operates an annual turnover of around £4 million.
EPCC was established in 1990 as a focus for University work in High Performance Computing (HPC). It is an integral part of the School of Physics, and has a grade-5 Research Assessment Exercise (RAE) rating (International Excellence).
EPCC is home to some of Europe's most advanced computing facilities. EPCC manages an exceptional range of computers, including an IBM eServer Blue Gene system and QCDOC system, which is unmatched by any other European university.
Part of EPCC's remit has always been to provide access to leading-edge supercomputing resources for grand-challenge research projects throughout the UK.
Today EPCC continues this tradition through the HECToR and HPCx consortiums that provide the UK's leading national HPC services.
EPCC's team of experienced consultants and software engineers have a wealth of expertise in the latest technologies and in-depth knowledge of database programming, network programming and parallel programming technologies.
Here are a few of our recent high-profile projects.
In our latest joint project, DPM and EPCC aim to tackle the problem of efficiently processing large amounts of high throughput post genomic data by using of HPC platforms.
The Simple Parallel R INTerface (SPRINT) framework allows biostatisticians to easily access and exploit the power of networked clusters to analyse genomic data with the statistical language R.
SPRINT consists of a user friendly GUI, an intelligent HPC harness and a library of parallel R functions. A prototype of SPRINT including a parallel implementation of the R function for the Pearson correlation was developed by work supported by the e-Science Data, Information and Knowledge Transformation (edikt2).
A two year project started on the 1st April 2009 to develop the framework and add a number of commonly used functions to SPRINT. This work is supported by the Wellcome Trust Technology Development Grant [086696/Z/08/Z].
Read an article on our work published in Bioinformatics.
We are also currently leading another activity of the edikt2 programme managed by EPCC.
The Minimum Information About a RNAi Experiment (MIARE) activity aims to develop bioinformatics tools and scientific standards to support high throughput interference RNA screening. In particular this activity is core in the establishment of the MIARE standard and the creation of ontologies, data models and standard for the exchange of RNAi experimental data.
Visit the MIARE website for more information.
EPCC recently provided HPC expertise in a simulation study ran by DPM to determine the optimal number of biomarkers for patient classification aimed towards improved detection, diagnosis, and therapeutic monitoring of diseases.
This study investigated the impact of microarray based data characteristics on the predictive performance for various machine learning methods through the evaluation of various classification rules using tens of millions of simulated data sets.
This huge computational challenge was tackled with massively parallel computing and used R and MPI. The output provides a general guideline for determining optimal number of biomarkers for various classification purposes.
Previous joint efforts have produced PDQ-Wizard.
This application is an open access web-based tool that automates the process of interrogating biomedical references using the PubMed biomedical literature database and enables the fast classification and prioritisation of large lists of genes, proteins or free text (GenBank, RefSeq, UniGene, Entrez Gene, Gene Symbols, SwissProt).
Read an article about our work published in Bioinformatics.
ODD-Genes is the first collaboration between DPM and EPCC. This project also involved the MRC Human Genetics Unit (HGU) and produced a biomedical e-Science demonstrator using Grid technologies.
This comprised a genetic data analysis application that showed how researchers at DPM could automate repetitive microarray analysis tasks securely and seamlessly using remote HPC resources at EPCC.
This performed tightly linked queries on gene identifiers against remote, independently managed databases, such as the HGU Mouse Atlas database, hugely enriching the information available on individual genes.
The ODD-Genes project importantly showed what could be possible if microarray data analysis could be enhanced to exploit HPC.
Read more about ODD-Genes on the EPCC website.
This article was published on Dec 1, 2011