Bioinformatics researchers from across the UK and international collaborators will soon be able to benefit from a new private cloud high-performance computing (HPC) system to support their work on bacterial pathogens. The Cloud Infrastructure for Microbial Bioinformatics (CLIMB) project, a collaboration between the University of Birmingham, the University of Warwick, Cardiff University, and Swansea University, will create a free-to-use, world leading cyber infrastructure specifically designed for microbial bioinformatics research.
With over 7,500 vCPU cores of processing power, the CLIMB system represents the largest single system designed specifically for microbiologists in the world.
Dr. Thomas Connor, Senior Lecturer at the Cardiff School of Biosciences, Cardiff University, who designed the system with integration partner OCF says: “Bioinformatics research using the system is already helping to track viral and bacterial pathogens, develop new diagnostics, increase the understanding of bacterial resistance to antibiotics, and support many other related research projects. Using CLIMB we are able to overcome the difficulties we face when trying to perform our analyses on HPC infrastructure that is generally not suitable for microbial bioinformatics workloads.”
Powered by OpenStack cloud computing software, and provided by HPC, big data and predictive analytics provider OCF, one site’s system, at the University of Birmingham, is already in production running OpenStack Juno, and will soon be linked directly to the system at Cardiff. The final two sites are currently undergoing final testing by OCF before entering production this Autumn. CLIMB is already contributing to bioinformatics research both nationally and internationally, and research using the system has already been published in international scientific journals.
Simon Thompson, research computing specialist at the University of Birmingham, who built the initial CLIMB pilot system says “this is one of the most complex environments we’ve had to build, learning a lot of new technologies. Luckily we’ve been able to work with our contacts at IBM, Lenovo and OCF to help us build what is now quite a stable platform in Birmingham”.
To provide users with local high performance compute and storage, the fully open source OpenStack system is built using Lenovo System X servers connected to IBM Spectrum Scale storage connected through 56GB Infiniband – providing 500TB local storage at each of the four sites.
Using a set of standardised Virtual Machine images, the CLIMB system can spin up over 1,000 VMs at any one time enabling individuals and groups of researchers to take advantage of the system’s 78TBs of RAM. The system has been designed to provide large amounts of RAM in order to meet the challenge of processing large, rich, biological datasets and the system will be able to support the vast majority of the UK microbiology community.
‘What may work on one HPC system, won’t necessarily work on another – it’s often quicker to write a new application from scratch’
“We are now able to rapidly take multiple bacterial samples and generate DNA sequence data for each sample in a computationally readable format. However, handling hundreds, or thousands of these samples simultaneously poses serious challenges for most HPC systems,” says Dr. Connor. “Datasets within microbiology are very different to traditional HPC research. Workloads are often either embarrassingly parallel or very high memory, and all require large amounts of high performance storage. CLIMB has been specifically designed to take this into account, to enable the microbial bioinformatics community within the UK to access facilities they are unlikely to have available locally.”
Dr. Connor continues: “One of the issues that is becoming increasingly faced by bioinformaticians is the ability to share bespoke software applications. What may work on one HPC system, won’t necessarily work on another – it’s often quicker to write a new application from scratch. We believe that by using containers/Virtual Machines this problem can be overcome, by creating a mechanism to share software and data within a single cloud environment. By enabling researchers to share their software and data in this way, we free up bioinformaticians to spend more time doing research, and less on installing software and downloading data from multiple, disparate data repositories.”
“The fully open source OpenStack system enables us to work with the team at CLIMB to help modify and improve the system,” comments Arif Ali, Technical Director at OCF. “CLIMB really is a leading edge, private cloud solution and using OpenStack means we can stay on top of the latest developments in the system as well as modify it to fit the needs of researchers. On top of this, the research community can also contribute to the system meaning that the solution can fit all of their research needs.”
OCF will continue to work with the institutes in CLIMB to manage the open source OpenStack system and is currently upgrading the systems installed at Warwick and Swansea to OpenStack Kilo.