Blog

Apache Spark and CERN Open Data Analysis, an Example

This is a short post introducing a notebook that you can use to play with a simple analysis of High Energy Physics (HEP) data using CERN open data and Apache Spark. The idea for this work started with a concept for a technology demonstrator of some recent developments on using Spark for data analysis in the context of HEP.

Diving into Spark and Parquet Workloads, by Example

Topic: In this post you can find a few simple examples illustrating im

Oracle JET, ORDS & OAUTH2

Hello there,

SSO for Oracle REST DataServices

Hello there,

Recently I've started to dig into the ORDS authentication and more specifically in how to make it work against my Oracle WebLogic server authenticators.

On Measuring Apache Spark Workload Metrics for Performance Troubleshooting

Topic: This post is about measuring Apache Spark workload metrics for performance investigations.

Performance comparison of different file formats and storage engines in the Hadoop ecosystem

TOPIC

This post reports performance tests for a few popular data formats and storage engines available in the Hadoop ecosystem: Apache Avro, Apache Parquet, Apache HBase and Apache Kudu. This exercise evaluates space efficiency, ingestion performance, analytic scans and random data lookup for a workload of interest at CERN Hadoop service.

INTRO

Distributed Deep Learning with Apache Spark and Keras

In the following blog posts we study the topic of Distributed Deep Learning, or rather, how to parallelize gradient descent using data parallel methods. We start by laying out the theory, while supplying you with some intuition into the techniques we applied. At the end of this blog post, we conduct some experiments to evaluate how different optimization schemes perform in identical situations.

Upgrading my Oracle JVM Diagnostics Agents.

Starting up with the Oracle Java Cloud Services.

IPython/Jupyter SQL Magic Functions for PySpark

Topic: this post is about a simple implementation with examples of IPython custom magic functions for running SQL in Apache Spark usin

Darwin and Hadoop join forces to improve a face recognition algorithm

Custom Flume sources for ingesting data from database tables and log files

Benchmarking Apache Kafka on OpenStack VM's

I have been wanting to test Apache Kafka for sometime now and finally got around to it! In this blog post I give a very short introduction on what is Kafka, installation & configuration of Kafka cluster and finally benchmarking few near real-world scenarios on OpenStack VM's

Offline analysis of HDFS metadata

Apache Spark 2.0 Performance Improvements Investigated With Flame Graphs

Topic: This post is about performance optimizations introduced in Apache Spark 2.0, in particular whole-stage cod

Using Tiered Storage in Alluxio

Java web application based on OAuth2

Hello,

Last week I've investigated how does OAuth2 protocol works and developed a Proof of Concept (PoC) in Java. In this post I would like to show you how effortlessly develop simple client-server application using OAuth 2.0 standard for authorization of protected resources placed on a server.

Before we start developing our first secured web application with OAuth2 let's understand how it works.

What is it and how does it work?

Disclaimer

The views expressed in this blog are those of the authors and cannot be regarded as representing CERN’s official position.

CERN Social Media Guidelines

Blog

Blog

Apache Spark and CERN Open Data Analysis, an Example

Diving into Spark and Parquet Workloads, by Example

Oracle JET, ORDS & OAUTH2

SSO for Oracle REST DataServices

On Measuring Apache Spark Workload Metrics for Performance Troubleshooting

Performance comparison of different file formats and storage engines in the Hadoop ecosystem

Distributed Deep Learning with Apache Spark and Keras

Upgrading my Oracle JVM Diagnostics Agents.

Starting up with the Oracle Java Cloud Services.

IPython/Jupyter SQL Magic Functions for PySpark

Darwin and Hadoop join forces to improve a face recognition algorithm

Custom Flume sources for ingesting data from database tables and log files

Benchmarking Apache Kafka on OpenStack VM's

Offline analysis of HDFS metadata

Apache Spark 2.0 Performance Improvements Investigated With Flame Graphs

Using Tiered Storage in Alluxio

Java web application based on OAuth2

Experiences of Using Alluxio with Spark

A neural network scoring engine in PL/SQL for recognizing handwritten digits

Real-time visualisation of Hadoop resources

Disclaimer

Blogroll