Tools

Tools

Building an Apache Spark Performance Lab: Tools and Techniques for Spark Optimization

Submitted by canali on

Apache Spark is renowned for its speed and efficiency in handling large-scale data processing. However, optimizing Spark to achieve maximum performance requires a precise understanding of its inner workings. This blog post will guide you through establishing a Spark Performance Lab with essential tools and techniques aimed at enhancing Spark performance through detailed metrics analysis.

Apache Spark 3.0 Memory Monitoring Improvements

TLDR; Apache Spark 3.0 comes with many improvements, including new features for memory monitoring.

canali

Distributed Deep Learning for Physics with TensorFlow and Kubernetes

Submitted by canali on

Summary: This post details a solution for distributed deep learning training for a High Energy Physics use case, deployed using cloud resources and Kubernetes. You will find the results for training using CPU and GPU nodes. This post also describes an experimental tool that we developed, TF-Spawner, and how we used it to run distributed TensorFlow on a Kubernetes cluster.

 

IPython Notebooks for Querying Apache Impala

Topic: in this post you can find examples of how to get started with using IPython/Jupyter notebooks for querying Apache Impala.

canali

Disclaimer

The views expressed in this blog are those of the authors and cannot be regarded as representing CERN’s official position.

CERN Social Media Guidelines

 

Blogroll