A Summer at CERN: Evaluating OpenStack Trove as DBaaS Solution
I have been one of the 23 students participating in the CERN openlab summer student programme this year. Like two of my fellow students in the database group, Sneha and Anti already did, I want to share some insights into the project I worked on and in general about my experience with the summer programme. Thus, the post is divided into a general part and a technical part, which will sum up what I did with OpenStack and its component Trove.
The General Part: The Great Opportunities of this Internship
The CERN openlab summer student programme is not the only summer programme, and in general not the only programme for students at CERN. There are several different programmes for different fields of studies, where the biggest one in terms of the number of students certainly is the physics summer programme. During summer, all the longer term students and the official summer students, and Erasmus students from different universities, populate CERN in addition to the regular personnel like fellows, staff and permanents. And they come from all over the world, which creates a very inspiring and vibrant mix of people. It was brilliant to see us all from so many different cultures, living together for months on the CERN site without major problems. These summer programmes already are a great justification for CERN celebrating 60 years of science for peace.
During this summer, I met a lot of very nice and friendly people, and I am happy that I can call a lot of them friends now. We spent a great time together discussing work, of course discussing physics, and spending our free time at CERN and in the stunning environment of Geneva: the mountains, the lake, and the lots of other international organizations. To name only some activities: We organized a summer student choir, dancing workshops, visited beautiful cities nearby as well as chocolate and cheese factories. A group of people and me started the Blindstore open source project during the CERN Webfest, and some of us met again two months after the summer programme to continue working on it during THE Port hackathon at CERN, end of October 2014.
The CERN openlab summer programme included a lot of official activities. We had the possibility to visit nearly all the CERN experiments and facilities during our stay, something you normally have to plan several months ahead, as CERN welcomes a hundred thousand visitors each year. The CERN openlab students went on two trips, one to Zürich and one to Grenoble. In Zürich, we visited the company Open Systems and the ETH, and then spent one whole day at the offices of Google, this was amazing—by chance (?) we even saw one of the Google Street View cars! In Grenoble, we visited the European Synchroton (ESRF) and the Institut Laue-Langevin (ILL), both very notable institutes in the science world. I am very grateful for the insights into well-known research institutes and companies I got from these visits, as this will certainly help me decide where I want to work later. Another nice activity was a short video interview the CERN video team recorded with all CERN openlab students—a good way to practise standing in front of a camera. The video might appear on the website of the CERN openlab summer student programme next year. Besides that, we had 15 interesting lectures and on the other hand had to present our project by giving a five minute lightning talk.
Seven of the CERN openlab summer students, including me, worked within the database group of the IT department, called IT-DB internally. We were welcomed very heartily and integrated in work and free time activities. This included several barbecues on CERN's barbecue sites and some farewell drinks for others and finally for us summer students. Everyone of us seven presented their project in a final 20 minutes presentation in the group meeting. One last thing to say about the CERN openlab summer programme: I really enjoyed it, to sum up the things already mentioned, because the visits, trips, lectures, trainings, the CERN openlab barbecue, the Friday afternoon coffee and ice cream meetings, all these activities were really well matched to form a great summer programme and made us grow together as a group through the nine weeks we stayed at CERN.
CERN has made a lot of things possible for me this summer. If you are currently a student and an internship might fit into your schedule, I can highly recommend you to consider CERN!
The Technical Part: OpenStack Trove, Databases on Demand
My project was part of the Database on Demand service (DBOD), which is CERN's internal Database as a Service (DBaaS) solution, developed in-house by the database group (IT-DB) and used by a lot of different groups from for example the physics experiments, the web development teams and the human resources department. From the user’s perspective, DBOD provides hosting of databases—managing the installation and upgrade procedures—and an easy to use web interface to perform database backups, recoveries, database configuration, and monitoring. Since its start in 2012, DBOD has grown rapidly within CERN, now hosting 168 database instances, whereof 141 are MySQL, and the others PostgreSQL and Oracle.
The Project's Motivation
The Database on Demand service is in itself a framework profiting from pre-existing IT-DB management tools and infrastructures. As part of the ongoing migration of CERN’s virtual infrastructure to an OpenStack based solution the IT-DB group is migrating towards OpenStack from an Oracle VM platform, on an effort to converge with the global CERN IT infrastructure platforms. Thus, the existing DBOD software is adapted step-by-step to manage virtual machines on OpenStack. By the way, CERN just won the first OpenStack Superuser Award for having one of the largest OpenStack deployments and for its contributions back to the OpenStack community. The OpenStack platform, since its most recent release Icehouse, includes a new component, named Trove, which provides Database as a Service functionalities. The main goal of my project was to evaluate this component from the point of view of the feasibility of using it as a resource provider for the Database on Demand service. With this objective, some major points of interest were: the current status and maturity of the Trove project, ability to support additional database types, and compatibility with Scientific Linux and in the future CentOS, as the computing platform running most CERN services. Last year, summer student Andrea Giardini already worked on evaluating Trove. Back then, the Trove project was not yet integrated into OpenStack officially and even more in an early development state. The installation on Scientific Linux in the end was not successful, but instead, he contributed to the further development of Trove. See his report “DBaaS with OpenStack Trove” for details.
Summary of My Work
Most of my working time at CERN I spent setting up OpenStack and Trove on Scientific Linux. First, I struggled with the general network setup of OpenStack, as at the first time really working with networks, it can be quite a hassle to get virtual instances to talk with you, or the internet. The details and a suggestion for making this easier next time are included in my report. You can download the report from Zenodo, an open access publishing platform operated by CERN: “OpenStack Trove: Evaluation and Interfacing with the CERN Database on Demand Service (pdf)”. After finishing the setup of OpenStack, I managed to install Trove on top of it just one week before my internship ended. This took me so long because the publicly available documentation and packages of Trove for Red Hat based Linux distributions are not yet ready for immediate use. I filed some bug reports against OpenStack's official documentation to improve this situation. Thus, in the end, Trove was running on Scientific Linux, but problems not related to Trove but to OpenStack made it impossible to test its features during the last days of my stay. However, from the existing publications of companies using Trove (1, 2), it can be believed that the basic functionality is indeed present and working. It might be a challenge to get OpenStack Trove running on Scientific Linux though—but once it is up and running, the functionality needed for DBOD might already mostly be there. More details are available in my report. It includes more thoughts about the possible integration of Trove into DBOD and about the use of OpenStack within DBOD, like using Docker as a “hypervisor” (link to the project report of a fellow CERN openlab student). If you are new to OpenStack, I recommend you to read the section in the report where I give an overview over the different platforms the community uses to communicate and guidelines how to choose the right one for your questions. And if you happen to be the next summer student working on an evaluation of Trove, or in general someone who wants to start working with Trove, have a look at the suggestions I make in my conclusion in the report, and the description of the installation to see if it can save you some time setting up your own OpenStack test cloud. Regarding the issues I encountered with my OpenStack setup, I want to share a blog post with you which I found inspiring from a Google Summer of Code intern who worked with OpenStack and writes about how she experienced her internship and how she dealt with blockers during her work.
OpenStack and Open Source
I really like the OpenStack project for following the open source approach. It is amazing to see how many people, companies and organisations work on it together, using and integrating a lot of existing open source projects, extending them, plugging them together to what we know as the OpenStack platform. What is it what they plug together? Hypervisors, software defined networks, block device managers, databases, web frameworks and much more. Here, in my opinion, the strength of open source is shown. Just have a look at Stackalytics to see the diversity of contributors, presented with nice charts and colours. On the other hand, OpenStack and Trove are different from other open source projects like the Linux kernel, web servers, web development frameworks and content management systems, because, well, cloud computing is not the kind of things people usually do in their free time. This affects how much documentation and support is available online for free—the development is spread over a lot of companies and organisations which most of the time can not only do things for the sake of doing it—like when you play around with a nice new development framework in your free time and blog about how you solved problems with it—but need to follow their schedules and deliver a service. But maybe this just means that OpenStack is one of the rare open source projects for which the support model actually can work well. Which is a great thing, because this way, we can have a stunning open source platform, and at the same time a business model for people to make a living. In the case of Trove, I am sure that the community will grow, as it has some nice features on the roadmap.