OCF solutions can be delivered fully configured with the latest AI frameworks and other software. Find out about Lenovo LiCO and NVIDIA GPU cloud here.
Containers are key part of AI deployments as, they help to ensure users can easily access the latest versions of frameworks, whilst ensuring the reproducibility of experiments using earlier versions. By deploying frameworks in containers, some experiments can continue to use the old versions, whilst others use the latest and greatest without needing to re-provision the entire system.
The NVIDIA GPU Cloud (NGC) is OCF’s preferred repository of deep learning and AI frameworks for NVIDIA GPUs. NVIDIA optimise all the leading open source Artificial Intelligence (AI) and Deep Learning (DL) frameworks, to make best advantage of the thousands of cores on NVIDIA GPUs, improving run-time without compromising on results.
IBM’s AI/DL strategy focuses on simplification, enabling new users to quickly gain insights and value from their data. The software consists of not just optimised frameworks, but a suite of tools to accelerate productivity of AI/DL practitioners.
When we look at AI/DL use cases, we start to see the maths and architectures are very similar across disciplines. The IBM software builds upon this by providing a complete toolset to enable users to train high performing models, without requiring coding or data science expertise.
IBM Visual Insights makes best advantage of the IBM POWER9 architecture to simplify multi-GPU training with its large model support feature, as well as multi-node scaling with its distributed deep learning feature scaling to 1000s of servers.
The rapid growth in AI/DL workloads has led to a huge increase in the demand for tightly coupled, high performance compute, storage, and GPU resources. This a problem which is not new in the HPC world but has caused a host of data scientists to require HPC-like hardware with little experience in using it.
To help bridge this gap Lenovo has developed LiCO, an end-to-end software solution which allows users with no HPC experience to utilise a shared clustered resource. Users can select a framework (such as TensorFlow or Caffe), upload their training data, tag it if necessary, train their network and then even publish their results all from the same user-friendly interface. LiCO has support for Kubernetes allowing easy sharing and replication of AI/DL frameworks via containers.
From an administrators perspective LiCO provides a dashboard to view the status of the cluster, as well as integration with the Slurm open source resource manager to offer billing and queue information.