Databases at CERN blog

Intelligent monitoring with a new general-purpose metrics monitor

In the database team at CERN, we have developed a general-purpose metrics monitor, a missing part in our next generation monitoring infrastructure.

In the implemented metrics monitor, metrics can come from several sources like Apache Kafka, new metrics can be defined combining other metrics, different analysis can be applied, notifications, configuration can be updated without restarting, it can detect missing metrics, ...

Performance comparison of different file formats and storage engines in the Hadoop ecosystem



This post reports performance tests for a few popular data formats and storage engines available in the Hadoop ecosystem: Apache Avro, Apache Parquet, Apache HBase and Apache Kudu. This exercise evaluates space efficiency, ingestion performance, analytic scans and random data lookup for a workload of interest at CERN Hadoop service.





Distributed Deep Learning with Apache Spark and Keras

In the following blog posts we study the topic of Distributed Deep Learning, or rather, how to parallelize gradient descent using data parallel methods. We start by laying out the theory, while supplying you with some intuition into the techniques we applied. At the end of this blog post, we conduct some experiments to evaluate how different optimization schemes perform in identical situations.


Subscribe to Databases at CERN blog