The new, Advanced Research Computing (ARC) central HPC resource is supporting research across all four Divisions at the University of Oxford; Mathematical, Physical and Life Sciences; Medical Sciences; Social Sciences; and Humanities.
With around 120 active users per month, the new HPC resource will support a broad range of research projects across the University. As well as computational chemistry, engineering, financial modeling, and data mining of ancient documents, the new cluster will be used in collaborative projects like the T2K experiment using the J-PARC accelerator in Tokai, Japan. Other research will include the Square Kilometer Array (SKA) project, and anthropologists using agent-based modeling to study religious groups
The new service will also be supporting the Networked Quantum Information Technologies Hub (NQIT), led by Oxford, envisaged to design new forms of computers that will accelerate discoveries in science engineering and medicine.
The new HPC cluster built by OCF comprises of Lenovo NeXtScale servers with Intel Haswell CPUs connected by 40GB Infiniband to an existing Panasas storage system. The storage system was also upgraded by OCF to add 166TBs giving a total of 400TBs of capacity. Existing Intel Ivy Bridge and Sandy Bridge CPUs from the University of Oxford’s older machine are still running and will be merged with the new cluster.
Twenty NVIDIA Tesla K40 GPUs were also added at the request of NQIT, who co-invested in the new machine. This will also bring benefit to NVIDIA’s CUDA Centre of Excellence, which is also based at the University.
“After seven years of use, our old SGI-based cluster really had come to end of life, it was very power hungry, so we were able to put together a good business case to invest in a new HPC cluster,” said Dr Andrew Richards, Head of Advanced Research Computing at the University of Oxford. “We can operate the new 5,000 core machine for almost exactly the same power requirements as our old 1,200 core machine.
The new cluster will not only support our researchers but will also be used in collaborative projects as well; we’re part of Science Engineering South, a consortium of five universities working on e-infrastructure particularly around HPC. We also work with commercial companies who can buy time on the machine so the new cluster is supporting a whole host of different research across the region.”
Simple Linux Utility Resource Manager (SLURM) job scheduler manages the new HPC resource, which is able to support both the GPUs and the three generations of Intel CPUs within the cluster.
Julian Fielden, Managing Director at OCF comments: “With Oxford providing HPC not just to researchers within the University, but to local businesses and in collaborative projects, such as the T2K and NQIT projects, the SLURM scheduler really was the best option to ensure different service level agreements can be supported. If you look at the Top500 list of the World’s fastest supercomputers, they’re now starting to move to SLURM. The scheduler was specifically requested by the University to support GPUs and the heterogeneous estate of different CPUs, which the previous TORQUE scheduler couldn’t, so this forms quite an important part of the overall HPC facility.”
Dr Richards continues: “As a central resource for the entire University, we really see ourselves as the first stepping stone into HPC. From PhD students upwards i.e. people that haven’t used HPC before – are who we really want to engage with. I don’t see our facility as just running a big machine, we’re here to help people do their research. That’s our value proposition and one that OCF has really helped us to achieve.”