Blog

Blog

Integrating Hadoop and Elasticsearch – Part 2 – Writing to and Querying Elasticsearch from Apache Spark

Submitted by pkothuri on

Introduction

In the part 2 of 'Integrating Hadoop and Elasticsearch' blogpost series we look at bridging Apache Spark and Elasticsearch. I assume that you have access to Hadoop and Elasticsearch clusters and you are faced with the challenge of bridging these two distributed systems. As spark code can be written in scala, python and java, we look at the setup, configuration and code snippets across all these three languages both in batch and interactively.

How to generate subset out of Real Application Testing captures

Submitted by sskorupi on

I've already mentioned on this blog very useful Consolidated Database Replay feature, for example while testing unified auditing performance impact (http://db-blog.web.cern.ch/blog/szymon-skorupinski/2014-06-unified-auditing-performance) or while investigating problems with hanging workload capture (http://db-b

XFS on RHEL6 for Oracle - solving issue with direct I/O

Submitted by sskorupi on

Recently we were refreshing our recovery system infrastructure, by moving automatic recoveries to new servers, with big bunch of disks directly connected to each of them. Everything went fine until we started to run recoveries - they were much slower than before, even though they were running on more powerful hardware. We started investigation and found some misconfigurations, but after correcting them, performance gain was still too small.

Disclaimer

The views expressed in this blog are those of the authors and cannot be regarded as representing CERN’s official position.

CERN Social Media Guidelines

 

Blogroll