Topic: This post describes a data pipeline for a machine learning task of interest in high energy physics: building a particle classifi
This is a short post introducing a notebook that you can use to play with a simple analysis of High Energy Physics (HEP) data using CERN open data and Apache Spark. The idea for this work started with a concept for a technology demonstrator of some recent developments on using Spark for data analysis in the context of HEP.
Integrating Hadoop and Elasticsearch – Part 2 – Writing to and Querying Elasticsearch from Apache Spark
In the part 2 of 'Integrating Hadoop and Elasticsearch' blogpost series we look at bridging Apache Spark and Elasticsearch. I assume that you have access to Hadoop and Elasticsearch clusters and you are faced with the challenge of bridging these two distributed systems. As spark code can be written in scala, python and java, we look at the setup, configuration and code snippets across all these three languages both in batch and interactively.
The views expressed in this blog are those of the authors and cannot be regarded as representing CERN’s official position.
Christian Antognini, Karl Arao, Martin Bach, Mark Bobak, Wolfgang Breitling, Doug Burns, Kevin Closson, Cloudera blog, Wim Coekaerts, Bertrand Drouvot, Enkitec blog, Pete Finnigan, Richard Foote, Randolf Geist, Marco Gralike, Brendan Gregg, Kyle Hailey, Tim Hall, Uwe Hesse, Frits Hoogland, Hortonworks blog, Integrity Oracle Security, Tom Kyte, Adam Leventhal, Jonathan Lewis, Cary Millsap, James Morle, Karen Morton, Arup Nanda, Mogens Nørgaard, Oracle The Data Warehouse insider, Oracle Enterprise Manager, Oracle Linux blog, Oracle Multitenant, Oracle Optimizer blog, Oracle R technologies, Oracle Upgrade blog, Oracle Virtualization blog, Kerry Osborne, Tanel Poder, Planet PostgreSQL, Kellyn Pot'Vin, Pythian blog, Greg Rahn, Mark Rittman, Riyaj Shamsudeen, Chen Shapira, Carlos Sierra, Szymon Skorupinski