SQLPlotter & my summer at CERN

This summer I got selected for an internship at CERN and it was a summer beyond imagination! I am here to share my wonderful experience with you all and throw some light on the most significant part of this-my project: Developing web-interface for a Physics analysis database, SQLPlotter.

In June 2014, I came to CERN to work on an advanced IT project with the database department. Till then, although I was already provided with some hint by my supervisor Ms. Maaike Limper, I knew not much about what I was going to do exactly or how. About CERN, I knew it as a prestigious organization with some of the brightest minds in physics and computing working under its umbrella. But I learnt everything else when I came here.

In stark contrast to my belief that perhaps this would be a monotonous project work, it was one of the most exciting, adventurous and fun-filled summer I had. It was an excellent platform to meet people from all over the world, make new friends and work in a multi-cultured and multinational environment, something that I had always dreamt of! Besides the project work, we had various opportunities to participate in dance workshops, attend summer student parties and enjoy the festivities of Geneva. And of course- I can't ever forget the Zurich trip to Google and OpenSystems ! Apart from the fun we had, the trip was indeed inspiring. We also had a trip to Grenoble (where I saw a nuclear reactor for the first time in my life) and CMS where I could finally see one of the huge detectors that CERN boasts of!

The memories are too many to share them all but I have tried to recollect everything and mentioned the most significant ones. Besides the trips organized by CERN, we had exciting weekends during which we went to Cheese Factory and Castle at Gruyeres, Cailler Chocolate factory (where we tasted delicious Swiss chocolates), Paris, Venice, Chamonix (but the weather wasn't good enough to get a view of Mount-Blanc!) and finally shopping to make the most of this opportunity.

Enough fooling around! Now I shall share the serious stuff!

My Project: Developing a web-interface for a physics analysis database-SQLPlotter

So, what is SQLPlotter? It is a C++ macro written using ROOT classes by Ms. Maaike Limper, my supervisor and my task was to develop a web-interface for this class, so that users can input a SQL Query which is then executed by the interface. Finally some physics analysis is done on the database to get histograms.

Currently, CERN uses ROOT which is a C++ framework developed by CERN physics community to perform physics analysis. This is a file analysis system which uses ROOT ntuples for analysis. The approach is time consuming and I/O intensive. It requires the users to setup ROOT and install libraries every time the analysis is done. SQLPlotter explores the possibility of replacing this analysis with a centrally accessible Oracle database instead.

The interface has been developed using simple web-technologies including PHP, JavaScript, HTML5 & CSS3 and makes it easier for the users to do analysis using a centrally accessible database (presently a subset of LHC experiment data) with no need to learn or setup ROOT and to install any libraries. The script does everything for the user!

While mainly SQL is used to store data to or reteieve data from database tables, SQLPlotter uses SQL to perform complex physics analysis. The interface provides user with a simple editor to write or load SQL queries and then execute them with a button click. The editor uses the javascript framework, Codemirror.js to make the editor efficient with capabilities to highlight syntax, auto indentation and open-close brackets, paranthesis and quotes etc. The figure below shows an example of a complex SQL query loaded in the editor and ready to be executed. Upon execution, the query results are stored into a temporary database table named ‘QUERY_RESULT’. This table is then used to perform analysis and create plots.

Create Plot Screen

Once the query is executed on one of the cluster machines running the web server and ROOT locally, the user is directed to a screen where he/she is provided with options to create either a one-dimensional or two-dimensional plot.

In case of a one dimensional plot, the user needs to enter one column name, followed by number of bins and a minimum and maximum value to specify the range. The form again uses HTML5 form validation feature to ensure that all required values are filled in for submitting the form. There is also a select field to select the drawing-option for the plot. The default drawing-option is ‘E’ (display error-bars) and there are other options to choose as defined in the ROOT framework.The snapshot below shows a plot for INV_MASS (the column name representing the invariant mass of two muons) for 400 bins and a range of 1-105. The query shows peaks which are actually physics particles. For example, the largest peak around INV_MASS=90, shows that LHC collisions produced many Z-bosons (Z-bosons have an invariant mass of 90 GeV and can decay to muon-pairs). Similarly, we can re-run the query for different values and smaller ranges without executing the query again as long as we are in the same session.

One dimensional plot

For a two-dimensional plot the user needs to do a similar process, except that the fields are now required to be filled in for two variables or columns against each other. The default option is ‘COLZ’. The plot below, plots ‘DELTAR’ and ‘INV_MASS’ against each other for 100 bins each and a range of 0-4 and 1-110 respectively with option ‘COLZ’.


Two dimensional plot

In this manner, the project shows an implementation that how database technology can be used to perform complex physics analysis. The created plots can be analyzed to find exotic particles. Thus, the project also enlightens the fact that the system may be used as an educational tool for the students who are interested in data analysis. They can use the pre-defined queries illustrating how to find exotic particles and they can use the examples as a basis to build their own queries.

Currently, SQLPlotter uses Oracle tables but experiments are being conducted to store physics analysis data using different technologies such as Hadoop+Impala or PostgreSQL. The interface is database technology independent, only the SQL-query provided by the user might change due to the difference in SQL language support between different technologies. Thus changing the input data source can be easily accommodated in the interface. The project is complete in itself but there are other areas of improvement.

Doing this project was really fun. Quite often we had barbeques with the database team which was really nice. I learnt a lot of new things and the best part was that I won the best presentation award in our lightening talks, thanks to my supervisor's support and efforts. The entire database team was supportive and inspiring during the entire stay and work at CERN.

I can never forget this experience and hope that you all would have liked to read it as well. 


Add new comment