HPC Systems Management

System Management

HPC resources of even just a few compute elements can be a full time system administration job without the proper systems management infrastructure in place. Deployment, configuration, package management, monitoring, user administration and security are just a few examples of tasks that must be performed routinely to ensure the cluster is running to its full potential.

OCF minimise the administrative burden of cluster management by providing a suite of powerful and easy to use HPC tools. The OCF Cluster Management Suite:

  • Ensures Operating systems are deployed automatically along with necessary configuration to get your cluster up and running with the touch of a button.
  • Enables user accounts to be imported from existing authentication resources within the organization and updated without administrative intervention.
  • Allows predictive failures, security events and hardware and software errors to be sent either to the cluster management appliance or the IT department’s centralized monitoring system for further action, if OCF’s built in event-response facilities have not already corrected the problem.
  • Deploys new or updated software and operating system patches to the entire cluster, groups of resources within the cluster or individual nodes , all from a centralized management facility.
  • Machines can be accessed without requiring staff to be on site with out-of-band remote power and console facilities
Workload Management

Making sure HPC facilities achieve as close to 100% utilization as possible isn’t as easy as providing users with accounts and a bit of storage space, especially with a large collection of heterogeneous resources. Even with a basic job scheduling system, users must learn to write complicated submission scripts, ensure these scripts contain the correct resource requests and submit these scripts to the appropriate queues. Once a job is queued, the systems administrator must monitor the workload to ensure each user is given a fair amount of resource as quickly as possible, that there are enough licenses available for the requested software package and that each job runs to completion. This can create a massive burden for systems administrators and significantly lower productivity amongst users.

OCF’s Cluster Experts minimize the impact of traditional workload management and maximize system throughput with:
  • Easy to use graphical facilities for users to submit jobs with minimal training
  • Fair-share capabilities that give users an appropriate amount of system time according to business need
  • Seamless auto-detection of the needs of the application, ensuring execution on the appropriate resource at the appropriate time, without users needing to know which queue to submit to.
  • Automatic re-deployment of operating systems to create a dynamic hybrid resource on a common pool of hardware.
  • Unique auto power on/power off features to save energy when cluster usage is low. This can amount to massive power savings and reduce load on air conditioning facilities.

OCF offer a wide variety of open source and commercial products within its overall Cluster Management Ecosystem. OCF Cluster Specialists will partner with the customer to determine the best mix of these tools to meet Application, User, System Administration and Environmental Requirements.
Examples of these tools include:

Message Passing
•    OpenMPI
•    Intel MPI
•    HP-MPI
•    OpenMP
•    Scali Connect MPI
•    LAM-MPI
•    MPICH/MVAPICH
•    GAMMA-MPI
•    PVM
Compilers
•    Intel C/C++ and Fortran
•    EKOPath Compiler Suite
•    Portland Group Server and Workstation
•    IBM XL C/C++ and Fortran
Libraries
•    ScaLAPACK
•    LAPACK
•    ATLAS
•    Intel MKL
•    ACML
•    IBM ESSL/PESSL
Workload Managers and Schedulers
•    Cluster Resources MOAB
•    PBS Pro
•    Platform LSF
•    Torque
•    MAUI
•    Sun Grid Engine
Cluster Management
•    IBM CSM
•    IBM xCAT
•    Cluster Resources’ Cluster Builder
•    Scali Manage
•    ROCKS
•    Oscar
Cluster/Enterprise Monitoring and Event Response
•    IBM Director
•    Cluster Resources’ MOAB Workload Manager
•    Ganglia
•    Nagios

For more information or to contact us, please e-mail: info@ocf.co.uk.

 

Latest News

More News >>

Key Partners

Looking for Support?

Click here to access our online support