#### (Note, this blog post also serves as my submission for CS 267 Homework 0)

## Bio of Author:

My name is John Dagdelen and I am a graduate student in the Materials Science and Engineering department at UC Berkeley. My advisor, Kristin Persson, is a Staff Scientist at LBNL and a Professor in the MSE department at UC Berkeley. Our group is one of the main contributors to the Materials Project, which aims to model the properties of *every* material and make the data freely available to the materials research community. I’m including my picture in this post so that potential project partners can find/approach me in class.

My research interests cover:

- Computing the properties of materials using Density Functional Theory (DFT).
- Developing computational screening methods for materials with useful properties.
- Using machine learning to mine large collections of materials data, including that stored in databases like the Materials Project and the text of scientific papers.
- Expanding the predictive power of computational methods in materials science through the development of new modeling techniques and more efficient algorithms.

I use NERSC’s high-performance computing systems in my work and I enrolled in CS 267 to learn how to fully utilize these resources to solve challenging problems in computational materials science.

## Case Study: Modern Density Functional Theory Calculations on High Performance Computing Systems

Isosurfaces of Electron density modeled using density functional theory. Image Credit: Ovito.org

Density functional theory (DFT) is a technique that allows us to calculate the properties of materials computationally rather than physically making and testing them in a laboratory. We do this by numerically solving a set of equations called the Kohn-Sham equations to get the density of electrons around all the atoms in a material. From this information we can deduce important information about how materials are held together, their electronic structure, and a host of other materials properties that are important in materials science.

Modern DFT represents a marriage between quantum mechanical theory and high-performance computation which has lead to a revolution in materials science over the last 20 years. These methods, which can accurately predict materials properties, sometimes within a few percent, have lead to the discovery of a host of materials with interesting properties including battery materials for ultrafast charging, new thermoelectric materials, and (a little self-promotion here) materials with exotic mechanical properties.

Thanks to advancements in DFT algorithms and software tools that have simplified the process of setting up and running DFT calculations, investigations that used to take an entire year can be done in under an hour. This is allowing these techniques to be used across a wide range of research initiatives and at enormous scales by groups such as the Materials Project, OQMD, and AFLOWLIB. Today, it’s estimated that more than one-hundred million CPU core hours are used for DFT calculations *every day* (source Matthias Scheffler). Many of the supercomputers on the Top 500 list are used to run DFT calculations including NERSC’s Cori system (#12on the list), which spends somewhere between 1/8 and 1/4 of its time running DFT calculations.

While There are two main challenges that make fitting DFT calculations into HPC ecosystems difficult:

- DFT calculations are generally “small” compared to “large” calculations like climate modeling, which are effectively favored over small jobs under the queue policies of supercomputing centers.
- While some parts of DFT calculations can be parallelized there are parts of these calculations that can only be performed
*in sequence* and they require lots of reading and writing of memory.

The first challenge comes about as a result of the “run limits” imposed by computing centers. It’s not fair to let any one user dominate a system’s job queue so there are often limits placed on the number of jobs that any given user can have running/queued up at a time. This is fine if you have a small number of huge calculations to do requiring the use of hundreds of nodes at a time, but what if you have thousands and thousands of smaller calculations? DFT calculations often fall into this second basket: we can theoretically calculate the properties of thousands of materials at a time but the run limits can keep us from doing so. When we try to submit a few hundred DFT calculations that would use the same amount of compute time as a larger climate model, the system de-prioritizes our requests because it looks like we’re trying to monopolize the system.

Fortunately, we have been able to overcome this problem through a cleaver workaround. By packaging thousands of DFT calculations into single “jobs”, we’re able to make our calculations more suitable for the infrastructure that these HPC systems are built around. This allows us to request the computing resources we need for our work without confusing the necessary filters that keeps an HPC system working properly for all of its users.

Image Cedit: Anubhav Jain, hackingmaterials.com

The second challenge I mentioned is more difficult to overcome. There is a lot of complexity involved but DFT is essentially the diagonalization of a Hamiltonian of dimension equal to the number of electrons. Since diagonalization is O(n^3), DFT scales similarly with the number of atoms. This process includes a number of steps that act as bottlenecks. Some can be GPU accelerated, like the many FFT steps required, but others can only be performed on CPUs, which adds I/O constraints to the problem. There have been some advancements in speeding up VASP (a very common DFT program) though various means, including using a hybrid MPI/OpenMP schema, but there is still lots of work to be done and this is an active area of research.

The parallel/thread scaling of the hybrid MPI/OpenMP VASP (version 4/13/2017) on the Cori KNL and Haswell nodes. Source: Performance of Hybrid MPI/OpenMP VASP on Cray XC40 Based on Intel Knights Landing Many Integrated Core Architecture. Zhengji Zhao, Martijn Marsman, Florian Wende, and Jeongnim Kim. Cray User Group Proceedings, 2017

Moreover, you can’t parallelize these calculations across an infinite number of processors. At a certain point your efficiency will drop drastically because processors will be stuck idle while they wait on results from other processors. Generally using the same number of cores as the number of atoms is a good rule of thumb, but there are lots of situations where this heuristic will not give you the best possible results. You can read a fantastic overview of how to get the most efficiency out of VASP running on multiple processors by Peter Larsson here.

The current state of the art allows us to model systems of a few hundred atoms at most, which is unfortunate because most of the interesting problems out there involve the interaction of thousands or millions of atoms at a time. If we could improve how DFT scales, we may be able to extend the capabilities of DFT into the territory where juicy problems in chemistry, biology, and materials science reside.

I think this is a worthy goal for a CS 267 project and I am happy to talk to anyone who is interested in learning more about doing DFT on parallel systems.