A high performance server cluster is enabling researchers at the Institute for Computational Cosmology (ICC), based at Durham University and the wider UK astrophysics community, to better understand the universe by allowing them to model phenomena ranging from solar flares to the formation of galaxies.
The cluster is part of the DiRAC (Distributed Research using Advanced Computing) national facility. As such members of the UKMHD consortium, ICC members and their national and international collaborators also use the cluster. In total, the cluster is used by researchers at universities in the UK including Leeds, Liverpool, Manchester, St Andrews, Sussex and Warwick, and from abroad by people in Australia, China, Germany and the Netherlands.
The cluster is known as The Cosmology Machine (COSMA) and is a combination of COSMA5, a new IBM and DDN technology infrastructure integrated with Durham University’s existing cluster, COSMA4 (originally installed in January 2011). Boosted by new infrastructure, COSMA now has 9856 CPU cores and 4096 GPU cores. It includes 71,000 Gigabytes (GB) of RAM and the peak performance of the system is 182T/Flops. COSMA has 3.5 PetaBytes (PB) of storage for the data produced by cosmology applications.
The server cluster and storage has been designed, built, installed and will be supported by OCF.
COSMA5 alone, the new, additional infrastructure is now the 4th most powerful University-based cluster in the UK based on the November 2012 Top500 supercomputers. COSMA5 has been specifically designed as a balanced server and storage solution and built using:
• 420 IBM iDataPlex dx360 M4 systems
• Intel Sandy Bridge CPU cores
• Mellanox FDR10 InfiniBand
• 3x IBM x3750 M4 systems serving as login and developing nodes
• 2.4PB of DataDirect Networks (DDN) data storage with two SD12K controllers configured in full redundant mode
• The storage is served via 6 GPFS servers connected into the controllers over full FDR and using RDMA over the FDR10 network into the compute cluster
“We can use telescopes to ‘watch’ how galaxies are formed but it takes millions of lifetimes,” says Adrian Jenkins, COSMA project scientist at Durham University. “The server cluster is helping us work on this problem much more quickly. We can model a single galaxy in a computer right through its formation process in a few days. We are beginning to understand the processes that shape galaxies. With the new cluster we can start to simulate large populations of galaxies and for the first time in the world model thousands of galaxies in a single region of the universe all at the same time and with high numerical resolution. A simulation like this will still take months to run, but with our previous cluster we simply didn’t have the computing power or the memory to run the model at all.”
He adds: “The additional storage capacity provided by COSMA5 is also essential. Over the last 3 months, one very large research project has already created 700 TeraBytes of data. We may need to return to our research data for further analysis many years after it was first processed, so we can’t remove any data until we’re sure it is not needed.”
“Along with exciting the general public and helping members of the public to understand their place in the cosmos, ICC’s research helps raise the profile of science in general and serves as an important factor in motivating young people to become scientists not just cosmologists,” says Julian Fielden, managing director, OCF plc.
“OCF has used its integration knowledge, skills, services and partner eco-system to meet successfully our significant data processing, data management and data storage challenges”, says Lydia Heck, Senior Computer Manager, Department of Physics, Durham University. “For COSMA5, OCF has designed a system, which delivers application performance gains for our cosmologists. The GPU part of the cluster stimulates new code designs to make effective use of the incredible performance of this technology”.