Impala

Offline analysis of HDFS metadata

Introduction

HDFS is part of the core Hadoop ecosystem and serves as a storage layer for the Hadoop computational frameworks like Spark, MapReduce. Like other distributed file systems, HDFS is based on an architecture where namespace is decoupled from the data. The namespace contains the file system metadata which is maintained by dedicated server called namenode and the data itself resides on other servers called datanodes.

This blogpost is about dumping HDFS metadata into Impala/Hive table for examination and offline analysis using SQL semantics

Subscribe to RSS - Impala

You are here