Simple, Scalable, Containerized Deep Learning using Nauta

SHARE:
Copied!

Deep learning is hard. Between organizing, cleaning and labeling data, selecting the right neural network topology, picking the right hyperparameters, and then waiting – hoping – that the model produced is accurate enough to put into production. It can seem like an impossible puzzle for your data science team to solve.

But the IT aspect of the puzzle is no less complicated, especially when the environment needs to be multi-user and support distributed model training. From choosing an operating system, to installing libraries, frameworks, dependencies, and development platforms, building the infrastructure to support your company’s deep learning efforts can be even more challenging than the data science. Add on top of that, the rapid pace of change in deep learning software and supporting libraries – many of which change monthly – creates a recipe for IT headaches.

Containerization helps solve some of the IT complexity. Instead of your IT staff cobbling together dozens of libraries and dependent software packages to make your deep learning framework of choice function, you can download pre-configured containers which handle all of that. Or you can have your data scientists build custom containers to meet their specific needs. However, your IT department must still build and configure infrastructure for orchestrating those containers, while providing a resilient, scalable platform for your data science team to be as productive as possible.

Nauta Deep Learning Platform

Nauta software seeks to solve many of the problems associated with building container orchestration infrastructure for deep learning. Nauta is a containerized deep learning platform which uses Kubernetes for container orchestration. It provides an intuitive command-line interface for building, running, curating and evaluating experiments, and it includes must-have features such as Jupyter notebooks and Tensorboard.

We’ve been using Nauta in the Dell EMC HPC & AI Innovation Lab, testing its features, functionality, extensibility, and ease of use. We use Nauta to run many of our cutting-edge deep learning research projects, including scalable convolutional neural network (CNN) training on chest xrays and ultra-scalable multi-head attention network training for language translation. It allows us to go from early proof-of-concept in Juypyter notebooks – to high-performance distributed training using the MPI-based Horovod framework for TensorFlow – to wide hyperparameter analysis for producing the most accurate model possible. Best of all, it’s a scalable platform built on top of Kubernetes and Docker, allowing us to easily share and replicate work between team members.

In addition to training neural networks, Nauta also provides a mechanism for testing deployments of trained models. This allows us to evaluate model accuracy, benchmark performance, and test reduced-precision quantization on new hardware, such as the 2nd-Generation Intel® Xeon® Scalable processor with Intel® Deep Learning Boost. Nauta allows inference on both batches of data, as well as streaming inference using REST APIs. And while Nauta isn’t expressly designed for production model deployment, the ability to evaluate trained models and experiment with reduced precision is an important component of the overall model development and deployment process.

Looking Forward

The Dell EMC HPC & AI Innovation Lab team continues to use, evaluate, report and resolve issues, and recommend improvements to Nauta. Select customers are also experimenting and evaluating Nauta on Dell EMC hardware, and Nauta will be a central component of future Ready Solutions. In the end, your company’s AI efforts are only going to be successful if the infrastructure is ready to support your data science team. Nauta provides an on-ramp for your IT organization and your data science team to get started training in an on-premises containerized environment quickly and easily.

Continue Reading
Would you like to read more like this?

Related Posts

Enhancing Public Safety and Security with AI

Artificial intelligence enables public sector organizations to approach public safety and security concerns with innovative solutions. From strengthening airport security and fighting crime to predicting natural disasters and tracking dangerous … READ MORE

Janet Morss September 11th, 2019

AI Scaling and Other Musings

Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning (DL) are more than just flare for an expo booth. Companies are transforming their business practices, driving productivity, and creating new … READ MORE

Robert Hormuth September 10th, 2019
Click to Load More
All comments are moderated. Unrelated comments or requests for service will not be published, nor will any content deemed inappropriate, including but not limited to promotional and offensive comments. Please post your technical questions in the Support Forums or for customer service and technical support contact Dell EMC Support.