The Blog

Performance comparison of different file formats and storage engines in the Hadoop ecosystem

TOPIC

 

This post reports performance tests for a few popular data formats and storage engines available in the Hadoop ecosystem: Apache Avro, Apache Parquet, Apache HBase and Apache Kudu. This exercise evaluates space efficiency, ingestion performance, analytic scans and random data lookup for a workload of interest at CERN Hadoop service.

 

 

INTRO

 

Distributed Deep Learning with Apache Spark and Keras

In the following blog posts we study the topic of Distributed Deep Learning, or rather, how to parallelize gradient descent using data parallel methods. We start by laying out the theory, while supplying you with some intuition into the techniques we applied. At the end of this blog post, we conduct some experiments to evaluate how different optimization schemes perform in identical situations.

Starting up with the Oracle Java Cloud Services.

Hello everyone,

In this entry I would like to share my experiences using Oracle Java Cloud Service, especially securing the application environment. I will show you some issues that I encountered during standard process of setting up environment. I will also explain some basic concepts that are fundamental to work with cloud services.

Darwin and Hadoop join forces to improve a face recognition algorithm

In this blog entry we introduce evolutionary algorithms and an integration between an evolutionary computation tool, ECJ, and Apache Hadoop. This research aims at speeding up the evaluation of solutions by distributing the workload among a cluster of machines. Finally, we make sense out of this integration showing how it has been used for improving a face recognition algorithm.

Pages