On Measuring Apache Spark Workload Metrics for Performance Troubleshooting
Topic: This post is about measuring Apache Spark workload metrics for performance investigations.
Topic: This post is about measuring Apache Spark workload metrics for performance investigations.
TOPIC
This post reports performance tests for a few popular data formats and storage engines available in the Hadoop ecosystem: Apache Avro, Apache Parquet, Apache HBase and Apache Kudu. This exercise evaluates space efficiency, ingestion performance, analytic scans and random data lookup for a workload of interest at CERN Hadoop service.
INTRO
In the following blog posts we study the topic of Distributed Deep Learning, or rather, how to parallelize gradient descent using data parallel methods. We start by laying out the theory, while supplying you with some intuition into the techniques we applied. At the end of this blog post, we conduct some experiments to evaluate how different optimization schemes perform in identical situations.
Topic: this post is about a simple implementation with examples of IPython custom magic functions for running SQL in Apache Spark usin
I have been wanting to test Apache Kafka for sometime now and finally got around to it! In this blog post I give a very short introduction on what is Kafka, installation & configuration of Kafka cluster and finally benchmarking few near real-world scenarios on OpenStack VM's
Topic: This post is about performance optimizations introduced in Apache Spark 2.0, in particular whole-stage cod
Hello,
Last week I've investigated how does OAuth2 protocol works and developed a Proof of Concept (PoC) in Java. In this post I would like to show you how effortlessly develop simple client-server application using OAuth 2.0 standard for authorization of protected resources placed on a server.
Before we start developing our first secured web application with OAuth2 let's understand how it works.
What is it and how does it work?
Topic: In this post, you will find an example of how to build and deploy a basic artificial neural network scoring engine using PL/SQL.
Topic: In this short post you can find examples of how to use IPython/Jupyter notebooks for running SQL on Oracle.
Topic: In this post you will find a short discussion and pointers to the code of a few sample scripts that I have written using Linux BPF/bcc and uprobes for
In the part 2 of 'Integrating Hadoop and Elasticsearch' blogpost series we look at bridging Apache Spark and Elasticsearch. I assume that you have access to Hadoop and Elasticsearch clusters and you are faced with the challenge of bridging these two distributed systems. As spark code can be written in scala, python and java, we look at the setup, configuration and code snippets across all these three languages both in batch and interactively.
Topic: in this post you can find examples of how to get started with using IPython/Jupyter notebooks for querying Apache Impala.
The views expressed in this blog are those of the authors and cannot be regarded as representing CERN’s official position.
CERN update, Quantum Diaries, Careers at CERN
Christian Antognini, Karl Arao, Martin Bach, Mark Bobak, Wolfgang Breitling, Doug Burns, Kevin Closson, Cloudera blog, Wim Coekaerts, Bertrand Drouvot, Enkitec blog, Pete Finnigan, Richard Foote, Randolf Geist, Marco Gralike, Brendan Gregg, Kyle Hailey, Tim Hall, Uwe Hesse, Frits Hoogland, Hortonworks blog, Integrity Oracle Security, Tom Kyte, Adam Leventhal, Jonathan Lewis, Cary Millsap, James Morle, Karen Morton, Arup Nanda, Mogens Nørgaard, Oracle The Data Warehouse insider, Oracle Enterprise Manager, Oracle Linux blog, Oracle Multitenant, Oracle Optimizer blog, Oracle R technologies, Oracle Upgrade blog, Oracle Virtualization blog, Kerry Osborne, Tanel Poder, Planet PostgreSQL, Kellyn Pot'Vin, Pythian blog, Greg Rahn, Mark Rittman, Riyaj Shamsudeen, Chen Shapira, Carlos Sierra, Szymon Skorupinski