Our latest server and storage cluster at the University of Durham, COSMA4 is now live and fully operational. Apart from a small ‘hiccup’, with heat build up between the system and water-cooling door, it is running very well.
The server cluster is doing exactly what it was designed to do; it is giving users more and considerably faster access to high performance data processing power and enabling new and larger collaborative projects with our research partners in Sussex, Manchester, Cambridge, Australia, Beijing, the Netherlands and Germany.
Our new GPFS file system allows much higher concurrency in writing data. Previously only 8 concurrent writes could be processed optimally. For large jobs, we now use 128 concurrent writes and recently – by accident – a user wrote from 1024 concurrent processes. The latter tested the file servers! I have seen a sustained throughput for a single multi-processor job of close on 2 GB/s.
In the last few weeks we have finished our first two ‘large jobs’. The first used 1580 cores of COSMA4 and nearly all of memory associated with this number of processors. It dumped out 30 TBs of data (17-18 TBs of useable data). The second used 1024 cores and dumped 20 TBs of useful data.
Although there are 620 TBs of GPFS storage in total, we could fill this at a very fast rate, so good data management will be essential.
We’re also working on supporting our high-throughput user community (separate to the HPC users). These users have now been migratied to our 800-core COSMA3 machine. COSMA3 is now fully incorporated into COSMA using the same OS as COSMA4 and is served by the same batch system. Thankfully, the users have their own 300 TBs of storage to support them.