Distributed Deep Learning for Physics with TensorFlow and Kubernetes

Submitted by canali on Mon, 03/23/2020 - 14:43

Summary: This post details a solution for distributed deep learning training for a High Energy Physics use case, deployed using cloud resources and Kubernetes. You will find the results for training using CPU and GPU nodes. This post also describes an experimental tool that we developed, TF-Spawner, and how we used it to run distributed TensorFlow on a Kubernetes cluster.


SystemTap into Oracle
canali Mon, 09/22/2014 - 21:00


The views expressed in this blog are those of the authors and cannot be regarded as representing CERN’s official position.

CERN Social Media Guidelines