Offline analysis of HDFS metadata


HDFS is part of the core Hadoop ecosystem and serves as a storage layer for the Hadoop computational frameworks like Spark, MapReduce. Like other distributed file systems, HDFS is based on an architecture where namespace is decoupled from the data. The namespace contains the file system metadata which is maintained by dedicated server called namenode and the data itself resides on other servers called datanodes.

This blogpost is about dumping HDFS metadata into Impala/Hive table for examination and offline analysis using SQL semantics

Integrating Hadoop and Elasticsearch - Part 1 - Loading into and Querying Elasticsearch from Apache Hive


As more and more organisations are deploying Hadoop and Elasticsearch in tandem to satisfy batch analytics, real-time analytics and monitoring requirements, the need for tigher integration between Hadoop and Elasticsearch has never been more important. In this series of blogposts we look at how these two distributed systems can be tightly integrated and how each of them can exploit the feaures of the other system to achieve ever demanding analytics and monitoring needs.

Using SQL Developer to access Apache Hive with kerberos authentication

With Hadoop implementations moving into mainstream, many of the Oracle DBA's are having to access SQL on Hadoop frameworks such as Apache Hive in their day to day operations. What better way is there to achieve this than using the familiar, functional and trusted Oracle SQL Developer!

Oracle SQL Developer (since version 4.0.3) allows access to Apache Hive to create, alter and query hive tables and much more . Lets look at how to setup connections to Hive both with and without kerberos and once connected look at the available functionality.

Subscribe to RSS - hive

You are here