Telephone: 0114 2572 200

New Cluster for Statistical Genetics Research at the Wellcome Trust Centre

New cluster propels Research Centre’s electron microscopy research into World league

The Wellcome Trust Centre for Human Genetics (WTCHG) at the University of Oxford is using a new Fujitsu high-performance BX900 blade-based cluster with Mellanox InfiniBand and DataDirect Networks storage systems integrated by OCF to support the genetics research of 25 groups and more than 100 researchers.

The Centre houses the second largest next-generation sequencing facility in England, currently producing more than 500 genomes per year. Each processed and compressed genome is about 30GB on disk and across the Centre roughly 15,000-20,000 human genomes occupy about 0.5PB. Numerous and wide-ranging research projects use this data to study the genetic basis of human diseases based on sophisticated statistical genetics analyses. Projects include national and international studies on various cancers, type-2 diabetes, obesity, malaria and analyses of bacterial genomes to trace the spread of infection. The Centre is one of the most highly ranked research institutes in the world and funds for the cluster were provided by a grant from the Wellcome Trust.

By understanding the characteristics of key genetics software applications and optimising how they map onto the new cluster’s architecture, the Centre has been able to improve dramatically the efficiency of these analyses. For example, analyses of data sets that took months using the Broad Institute’s Genome Analysis Tool Kit (GATK) can now be completed in weeks while using fewer cores.

The new cluster has also proved itself to be perfectly suited to supporting research by the Centre’s Division of Structural Biology (STRUBI) and it has already produced some of the world’s highest-resolution electron microscopy reconstructions – revealing structural details vital to understanding processes such as infection and immunity. The improvement in the performance of electron microscopy codes, particularly Relion, is also very impressive: movie-mode processing requiring more than 2 weeks on eight 16-core nodes of a typical cluster is now completed in 24 hours on just six of the new FDR-enabled, high-memory nodes.

“Advances in detector design and processing algorithms over the past two years have revolutionised electron microscopy, making it the method of choice for studying the structure of complex biological processes such as infection. However, we thought we could not get sufficient access to the necessary compute to exploit these advances fully. The new genetics cluster provided such a fast and cost-effective solution to our problems that we invested in expanding it immediately,” Professor David Stuart, Oxford University

The new cluster’s use of Intel Ivy Bridge CPUs provides a 2.6x performance increase over its predecessor built in 2011. It boasts 1,728 cores of processing power, up from 912, with 16GB 1866MHz memory per core compared to a maximum of 8GB per core on the older cluster.

The new cluster is working alongside a second production cluster; both clusters share a common Mellanox FDR InfiniBand network that links the compute nodes to a DDN GRIDScaler SFA12K storage system whose controllers can read block data at 20GB/s. This speed is essential for keeping the cluster at maximum utilisation and consistently fed with genomic data.

The high-performance cluster and big data storage systems were designed by the WTCHG in partnership with OCF, a leading HPC, data management, big data storage and analytics provider. As the integrator, OCF also provided the WTCHG team with training on the new system.

Dr Robert Esnouf, Head of the Research Computing Core at the WTCHG says:

  • “Processing data from sequencing machines isn’t that demanding in terms of processing power any more. What really stresses systems are ‘all-against-all’ analyses of hundreds of genomes, that is lining up multiple genomes against each other and using sophisticated statistics to compare them and spot differences which might explain the genetic origin of diseases or susceptibility to diseases. That is a large compute and data I/O problem and most of our users want to complete this type of research.
  • Each research group can use their own server to submit jobs to, and receive results from, the cluster. If it runs on the server it can easily be redirected to the cluster. Users don’t need to logon directly to the cluster or be aware of other research groups using it. We try to isolate groups so they don’t slow each other down and have as simple an experience as possible. Users have Linux skills, but they do not need to be HPC experts to use the system safely and effectively. It is a deliberate design goal.
  • We use DDN GRIDScaler SFA12K-20 for our main storage – that has 1.8PB raw storage in one IBM General Parallel File System [GPFS] file system. We have learnt a lot about GPFS and how to get codes running efficiently on it. The support team and technical documentation showed how we could exploit the local page pool (cache) to run our codes much more quickly. Our system serves files over the InfiniBand fabric into the cluster at up to 10GB/s (~800TB/day). The SFA12K is already >80% full, so we’re now offloading older data to a slower, less expensive disk tier to get maximum return on the SFA12K investment.
  • Along with the 0.5PB storage pool on the sequencing cluster and other storage we now squeeze ~5PB storage and 4000 compute cores into 10 equipment racks in a small converted freezer room. Despite its small footprint, it is one of the most powerful departmental compute facilities in a UK university.
  • OCF worked with us to design a high-specification cluster and storage system that met our needs; they then delivered it and integrated it on time and to budget. In fact, the OCF-Fujitsu-Mellanox-DDN package was the clear winner from almost all perspectives – being based on GPFS, to which we were already committed; winning price / performance combination, low power consumption, fast I/O and simplicity of installation. We even managed to afford a pair of additional cluster nodes with 2TB real memory each for really complex jobs – also through OCF-Fujitsu!”

Leave a Reply

Your email address will not be published. Required fields are marked *

Recent Comments