This was my 4th year being at Super Computing, and as ever, there were plenty of people attending as well as an ever expanding exhibition floor packed with the latest and greatest in HPC technology. There was also a small area in the technical talks area to do with Emerging Technologies, where various companies were showcasing new technologies that we should be looking out for, this is becoming pretty common at SC now and is a really great feature.
I arrived earlier than usual on Saturday, to be on-time for the GPFS Spectrum Scale (SS) User Group meeting on Sunday. Compared to last year, this meeting was a lot more focused on what the users wanted to hear, compared to a lot of IBM presentations last year, so it’s great that user opinions are being taken into account. Two main highlights for me came out of the IBM presentations; a new RESTful API that has come into SS, but it will be basic at first, and the new command mmnetverify that will be part of SS, which will help diagnosing network issues with SS. From the customer presentations, Aaron Knister from NASA Goddard explained the monitoring of SS using Grafana and InfluxDB, which I found really interesting. His talk really highlighted how it helped to diagnose problems on SS. All the presentations from this session will be available at http://www.spectrumscale.org/presentations/.
On Monday, I attended the DDN User Group, where the first session was primarily about Lustre. It was highlighted that support for Omnipath for Lustre is now available in the standard OS, SELinux support is now available, and that encryption is now available. There was a talk given by Sven Oehme from IBM, who talked about some of the work they have done. The key thing I learned was the new workerThread variable has been added, and this should be set so that SS can tune the system heuristically. This in essence tunes about 30 parameters in SS. There were some good benchmark results comparing performance with and without IME compared to Lustre, it showed that IME performance was much faster. The user had benchmarked WRF, and they were able to achieve about 400GB/s compared to 100GB/s natively.
Over lunch on Tuesday, in the LSF User Group, there were various customer presentations as well some updates from IBM on the Spectrum Compute Family, with Cluster Foundation CE replacing IBM Platform HPC. There is a new Academic Initiative, which allows academic institutions to have the LSF for free of charge. IBM also explained the release of LSF 10 was mainly for the performance, as LSF 9 was more a feature release. The key things were:
- Edison Throughput benchmark shows a performance improvement, 10% increase in throughput, and 10% reduction in the time to results
- It delivered more than double the performance
There was also an interesting talk from RedBull Racing, on how they have used LSF to accomplish their workloads using LSF family of products.
Over the course of the week, there were many interesting technical sessions, in fact too many to fit into my time at the event! There was a lot more presence of OpenStack, which highlights that OpenStack is to stay, and has a place in the HPC arena. As per usual, the MPICH BoF session had all the usual players with their updates on the respective MPIs. The main thing to come out, everyone is now preparing for CH4, for when MPICH 3.3 comes out later in 2017. The Slurm user group highlighted the updates in 16.05, primarily with cgroup updates, and Wrappers for other resource managers. The slides from Slurm are available at http://slurm.schedmd.com/publications.html for more information. The first ever Omni-Path User Group also took place, which was very interesting and it was good to hear feedback on the several sites that have implemented OPA.
Overall, I think it was a great event and was packed with very interesting topics. If you attended, what did you think? Which technologies were you interested in? I’d love to hear your thoughts in the comments section below.