There is a wealth of speculation and information on the internet regarding the Spectre and Meltdown problems recently discovered, and the potential implications for HPC in terms of both performance impact and security vulnerabilities. At present OCF is still continuing to gather information ourselves by executing performance tests in our labs, listening to our customers who are performing their own tests pre & post patching, and ultimately listening to our solution partners for their expert feedback on the matter.
Whilst there have been thousands of column inches written about the potential security issues it is fair to say that the following factors need to be borne in mind when assessing a course of action:
- There are no known instances of the ‘loopholes’ being successfully exploited in a harmful way-yet!
- Potential attackers need to have access to the system-in HPC environments these are usually ‘trusted’ users
- Potential attackers need to be highly skilled
- It is early days and it is highly likely that the initial patches issued will be improved over time
- Performance hits are likely to be very application dependent
- The supply chain effects are non-trivial, with processor manufacturers, motherboard manufacturers and system software vendors each having to assess and fix the problems
- Switch architectures are also likely to be affected, in addition to server and storage architectures
There is no doubt that this has been a huge topic of debate and if one thing is clear, it is that it is still early days. Our view is that it is a little too early to panic just yet: yes there are workarounds that can be applied immediately for the risk-averse amongst us, however, we feel at present the best approach our HPC customers can take when considering whether to patch their entire service is to:
- Separate a few nodes and run performance tests pre/post patching if possible to see the impact to service/production if the service is patched.
- Look at the userbase your service supports, is it a closed/trusted set of users or publicly facing
- Look at how the service is accessed, is it in a closed/controlled/private network or again publicly facing
- Assess the potential problems in the context of the overall security policies in place and assess in a risk based manner
This should help to provide perspective on whether the impact of patching is likely to be outweighed by either any performance loss your codes have exhibited in testing, or the likelihood of a highly skilled and untrustworthy individual actually being able to access the HPC system and exploit this complex vulnerability to gain any sufficient reward for their effort.
In conclusion, risks and costs need to be assessed on an individual basis. It is still early days and potential ‘fixes’ are likely to be more efficient as technology vendors have more time to investigate and react.
The risk of an attack highly depends on the skills of an attacker and how much effort they are willing to put in to exploit this to gain access to the data on the service. Therefore if you have considered the points above, in most cases it is a very low likelihood that an attack would happen and therefore you may only consider some of the workarounds are required on your service. We are ready to support you, our customers, in your efforts to come to grips with the problem; please contact your usual OCF representative and we will be happy to help.