Machine Learning Driven Importance Sampling Approach for Multiscale Simulations
TimeFriday, 13 November 20203:25pm - 3:40pm EDT
DescriptionAlmost all phenomena in science and engineering are inherently multiscale and many require exploration across orders of magnitude in both space and time. Solving such problems at the finest scales is computationally prohibitive and, instead, they are often formulated using multiscale models. Coupling the two scales, however, remains a challenging problem. Here, we present a new automated way to loosely couple two scales through a Machine Learning (ML) driven adaptive sampling approach that can focus on a user-defined hypothesis, e.g., diversity sampling. Our work advances the paradigm of heterogeneous, multiscale simulations by providing a generic framework to couple several scales in cascade in an arbitrarily scalable manner. We demonstrate our technique on multiscale simulations of the interactions of RAS and RAF proteins with plasma membrane in the context of cancer-signaling mechanism.
Given two scales to be coupled, macro and micro, our sampling framework uses a ML-based approach, supervised or unsupervised, to learn the important yet possibly-hidden features by exploring the space of macro configurations. Using the space of characteristic macro features, the model is able to distinguish between similar and dissimilar configurations. Next, we use a dynamic, adaptive sampling approach in the feature space to identify the most important configurations with respect to the scientific hypothesis under investigation. These selected configurations are promoted to be simulated at the micro scale. Given sufficient computational resources, our framework produces a macroscale simulation that, for each macro configuration explored, contains a microscale simulation similar enough to serve as statistical proxy. Our sampling approach also provides a means to debias the sampling through appropriate importance weighting, which allows reconstructing the relevant statistics of the microscale using macro samples. As a result, our framework is able to deliver macro length- and time-scales, but with the insights effectively from the microscale behavior.
Owing to its automated and dynamic approach, our sampling framework is capable of producing massively parallel multiscale simulations that scale to the largest machines on the planet. Previously, we utilized our sampling framework to conduct over 116,000 select microscale simulations, aggregating a total of 200 ms of coarse-grained trajectories, sampled from a 152 s long continuum (macroscale) simulation, utilizing the whole of Sierra with thousands of CPUs and GPUs for several days.
Here, we present our framework extended to support three scales (continuum, coarse-grained, and atomistic) of simulations of RAS and RAF proteins on plasma membranes with eight lipid species. Our framework uses an unsupervised deep learning model, a tailored autoencoder, focusing on capturing the spatial response of lipids to the presence of protein(s) under consideration to select “important” continuum configurations. We also present a novel approach to identify “important” coarse-grained configurations in a supervised approach using three biologically relevant reaction coordinates. Using these importance criteria, we create a three-scale simulation model that investigates the interactions between RAS, RAF, and the membrane in progressive detail based on the importance of simulated configurations. We report early breaking results from our simulation campaign run on Summit and demonstrate the flexibility, generalizability, and scalability of our framework.