In the modern world where everyone wants to be always connected, High Availability became one of the most important feature for a system. For example if you are running a system you don't want a failure in one piece of your architecture impacts the whole system. You have to make all the components of your architecture high available. In this post we will present how, in the Middleware section of Dabatase group at CERN, we setup a High Availability HAProxy based on CentOS 7.
In the database team at CERN, we have developed a general-purpose metrics monitor, a missing part in our next generation monitoring infrastructure.
In the implemented metrics monitor, metrics can come from several sources like Apache Kafka, new metrics can be defined combining other metrics, different analysis can be applied, notifications, configuration can be updated without restarting, it can detect missing metrics, ...
Topic: This post is about techniques and tools for measuring and understanding CPU-bound and memory-bound workloads in Apache Spark. You will find examples applied to studying a simple workload consisting of reading Apache Parquet files into a Spark DataFrame.
Now I am back in Norway, and it is time for looking back and reminisce about my amazing
and memorable summer at CERN. So, today I will tell you about my experiences from my
This is a short post introducing a notebook that you can use to play with a simple analysis of High Energy Physics (HEP) data using CERN open data and Apache Spark. The idea for this work started with a concept for a technology demonstrator of some recent developments on using Spark for data analysis in the context of HEP.
Topic: In this post you can find a few simple examples illustrating importa
Topic: This post is about measuring Apache Spark workload metrics for performance investigations.
This post reports performance tests for a few popular data formats and storage engines available in the Hadoop ecosystem: Apache Avro, Apache Parquet, Apache HBase and Apache Kudu. This exercise evaluates space efficiency, ingestion performance, analytic scans and random data lookup for a workload of interest at CERN Hadoop service.