Databases at CERN blog

Offline analysis of HDFS metadata

Introduction

HDFS is part of the core Hadoop ecosystem and serves as a storage layer for the Hadoop computational frameworks like Spark, MapReduce. Like other distributed file systems, HDFS is based on an architecture where namespace is decoupled from the data. The namespace contains the file system metadata which is maintained by dedicated server called namenode and the data itself resides on other servers called datanodes.

This blogpost is about dumping HDFS metadata into Impala/Hive table for examination and offline analysis using SQL semantics

Java web application based on OAuth2

Hello,

Last week I've investigated how does OAuth2 protocol works and developed a Proof of Concept (PoC) in Java. In this post I would like to show you how effortlessly develop simple client-server application using OAuth 2.0 standard for authorization of protected resources placed on a server.

Before we start developing our first secured web application with OAuth2 let's understand how it works.

What is it and how does it work?

Experiences of Using Alluxio with Spark

Introduction

Alluxio refers to itself as an "Open Source Memory Speed Virtual Distributed Storage" platform. It sits between the storage and processing framework layers in the distributed computing ecosystem and claims to heavily improve performance when multiple jobs are reading/writing from/to the same data. This post will cover some of the basic features of Alluxio and will compare its performance for accessing data against caching in Spark.

Integrating Hadoop and Elasticsearch – Part 2 – Writing to and Querying Elasticsearch from Apache Spark

Introduction

In the part 2 of 'Integrating Hadoop and Elasticsearch' blogpost series we look at bridging Apache Spark and Elasticsearch. I assume that you have access to Hadoop and Elasticsearch clusters and you are faced with the challenge of bridging these two distributed systems. As spark code can be written in scala, python and java, we look at the setup, configuration and code snippets across all these three languages both in batch and interactively.

Pages

Subscribe to Databases at CERN blog