Metadynamics of Diblock Copolymer Field-Theoretic Simulations on GPUs

Related Publications

Well-tempered metadynamics applied to field-theoretic simulations of diblock copolymer melts.

T. M. Beardsley and M. W. Matsen

J. Chem. Phys. 157, 114902 (2022)

Well-tempered metadynamics (WTMD) is applied to field-theoretic simulations (FTS) to locate the order-disorder transition (ODT) in incompressible melts of diblock copolymer with an invariant polymerization index of $\bar{N} = 10^{4}$. The polymers are modeled as discrete Gaussian chains with $N = 90$ monomers, and the incompressibility is treated by a partial saddle-point approximation. Our implementation of WTMD proves effective at locating the ODT of the lamellar and cylindrical regions, but it has difficulty with that of the spherical and gyroid regions. In the latter two cases, our choice of order parameter cannot sufficiently distinguish the ordered and disordered states because of the similarity in microstructures. The gyroid phase has the added complication that it competes with a number of other morphologies, and thus, it might be beneficial to extend the WTMD to multiple order parameters. Nevertheless, when the method works, the ODT can be located with impressive accuracy (e.g., $\Delta \chi N \sim 0.01$).

Field-theoretic simulations for block copolymer melts using the partial saddle-point approximation.

M. W. Matsen and T. M. Beardsley

Polymers 13, 2437 (2021)

Field-theoretic simulations (FTS) provide an efficient technique for investigating fluctuation effects in block copolymer melts with numerous advantages over traditional particle-based simulations. For systems involving two components (i.e., $A$ and $B$), the field-based Hamiltonian, $H_{f}[W_{-}, W_{+}]$, depends on a composition field, $W_{-}(r)$, that controls the segregation of the unlike components and a pressure field, $W_{+}(r)$, that enforces incompressibility. This review introduces researchers to a promising variant of FTS, in which $W_{-}(r)$ fluctuates while $W_{+}(r)$ tracks its mean-field value. The method is described in detail for melts of $AB$ diblock copolymer, covering its theoretical foundation through to its numerical implementation. We then illustrate its application for neat $AB$ diblock copolymer melts, as well as ternary blends of $AB$ diblock copolymer with its $A-$ and $B-$type parent homopolymers. The review concludes by discussing the future outlook. To help researchers adopt the method, open-source code is provided that can be run on either central processing units (CPUs) or graphics processing units (GPUs).

Computationally efficient field-theoretic simulations for block copolymer melts.

T. M. Beardsley, R. K. W. Spencer and M. W. Matsen

Macromolecules 52, 8840 (2019)

Field-theoretic simulations (FTS) provide fluctuation corrections to self-consistent field theory (SCFT) by simulating its field-theoretic Hamiltonian rather than applying the saddle-point approximation. Although FTS work well for ultrahigh molecular weights, they have struggled with experimentally relevant values. Here, we consider FTS for two-component (i.e., AB-type) melts, where the composition field fluctuates but the saddle-point approximation is still applied to the pressure field that enforces incompressibility. This results in real-valued fields, thereby allowing for conventional simulation methods. We discover that Langevin simulations are 1-2 orders of magnitude faster than previous Monte Carlo simulations, which permits us to accurately calculate the order-disorder transition of symmetric diblock copolymer melts at realistic molecular weights. This remarkable speedup will, likewise, facilitate FTS for more complicated block copolymer systems, which might otherwise be unfeasible with traditional particle-based simulations.

Project Motivations

Scientific Motivations

Determine whether WTMD is a viable method for detecting order-disorder transitions in field-theoretic simulations.
Identify the specific phases for which WTMD can detect transitions.
Quantify the accuracy with which transitions are detected.
Compare WTMD to other detection methods such as thermodynamic integration and the structure function.

Technical Motivations

The potential to detect ODTs with a single, relatively short simulation.

Background Theory

Field Theoretic Simulations of Diblock copolymer melts

The simulation engine upon which this work is based is described in the project: Langevin Field-Theoretic Simulations (L-FTSs) of Diblock Copolymers on GPUs. The following will assume that the reader has explored the "Background Theory" section of this other project in order to avoid repetition.

However, two important equations that are neccessary for understanding the methods used in well-tempered metadynamics are the unbiased diblock copolymer Hamiltonian:

$$\begin{eqnarray} &&\frac{H_{f}[W_{-},W_{+}]}{k_{B}T} = -n \ln Q[W_{-},W_{+}] \nonumber \\ &+& \rho_{0} \int \left( \frac{ \chi_{b}}{4} + \frac{ W_{-}^{2}({\bf r})}{\chi_{b}} - W_{+}({\bf r}) \right) d{\bf r}, \end{eqnarray}$$

and the evolution equation for the $ W_{-}({\bf r}) $ field:

$$\begin{eqnarray} W_{-}({\bf r}; \tau + \delta \tau) &=& W_{-}({\bf r}; \tau) \nonumber \\ &-& \left( \frac{\mathcal{D}H_{f}}{\mathcal{D}W_{-}} \right) \delta \tau \\ &+& \mathcal{N} (0, \sigma). \end{eqnarray}$$

Metastability

For a system with a free energy landscape that contains two or more deep minima, it can be difficult to explore the entire phase space. The system can become trapped in one of the minima since there is a large energy barrier impeding its escape. This is referred to as 'metastability'.

We can see the effect of metastability in our diblock copolymer (DBC) system, particularly in the region near the order-disorder transition (ODT). Consider the lamellar-disorder transition, for example, where each of the two phases corresponds to a deep free-energy minimum. If we start with a lamellar configuration and hold the simulation at its true transition segregation, $ (\chi N)_{\textrm{ODT}} $, it is likely to take a very long time to disorder in spite of the fact that both phases should be just as likely to occur. The reverse is also true in waiting for the system to form lamellae from a disordered state. Ideally, the system would readily explore both phases.

Figure 1a - Free energy, $F$, vs order parameter, $\Psi$ for a system with two states separated by an energy barrier. Each state (e.g., disorder and lamellar) corresponds to one of the two free energy minima.

Figure 1b - As the system explores the minimum corresponding to the disordered phase, gaussians added to the bias potential begin to fill the well. The relative height of the barrier becomes smaller.

Figure 1c - The addition of the bias potential has allowed the system to cross the energy barrier and begin exploring the minimum corresponding the the lamellar phase. Further guassians are added while exploring the new phase.

Figure 1d - The bias potential has filled the minima corresponding to the disordered and lamellar phases. The system can now cross the barrier freely and sample between the two states.

Well-Tempered Metadynamics (WTMD)

WTMD is a method to overcome the energy barriers separating the two (or more) phases by filling the minima with a bias potential.

The first step is to define an appropriate order parameter (collective variable), $ \Psi $, that is a function of the system coordinates. The order parameter should distinguish between the two phases, taking low values in the disordered state and high values in the ordered state, for example. The desired result of such an order parameter is depicted schematically in figure 1a, showing the free energy minima of disordered and lamellar phases at $ \chi N \approx (\chi N)_{\textrm{ODT}} $. Put another way, if we were to watch the value of the order parameter as a simulation progresses, it should be able to tell us which phase the system is in at any point.

The construction of an order-parameter is somewhat of an art and requires some experimentation. Fortunately, we were able to adapt the order-parameter used by Morse et. al. in particle simulations for our field-based system, resulting in: $$\begin{equation} \Psi = \frac{1}{R_{0}^{3}} \left( \frac{V}{(2 \pi)^{3}} \int f(k) \lvert W_{-}({\bf k}) \rvert^{l} d{\bf k} \right)^{1/l}, \end{equation}$$ where $ R_{0} $ is the polymer end-to-end length, V is the volume of the system, $ W_{-}({\bf k}) $ is the Fourier transform of $ W_{-}({\bf r}) $ and the constant $ l=4 $ was found to give the best separation between phases. The weighting function $$\begin{equation} f(k) = (1 + \exp(12(k/k_{c}-1)))^{-1} \end{equation}$$ acts to reduce the contribution of large wavevectors, $ k $, to the order parameter. The constant $ k_{c} $, is set to a value slightly larger than the position of the peak in the disordered-state structure function for a system just below the ODT.

The next step is to start filling the minima by means of a biasing potential, $ U(\Psi) $, which results in an updated Hamiltonian: $$\begin{equation} H = H_{f} + U(\Psi), \end{equation}$$ where $ H_{f} $ is the unbiased DBC hamiltonian. The biased Hamiltonian then necessitates a change to the forcing term in the Langevin field update such that it becomes:

$$\begin{eqnarray} &&W_{-}({\bf r}; \tau + \delta \tau) = W_{-}({\bf r}; \tau) \nonumber \\ &-& \left[ \left( \frac{\mathcal{D}H_{f}}{\mathcal{D}W_{-}} \right) + \left( \frac{\mathcal{D}U}{\mathcal{D}\Psi} \right) \left( \frac{\mathcal{D}\Psi}{\mathcal{D}W_{-}} \right) \right] \delta \tau \nonumber \\ &+& \mathcal{N} (0, \sigma). \end{eqnarray}$$

A WTMD simulation starts with an equilibration period without any influence from the bias potential, i.e. $ U(\Psi) = 0 $. For the $\chi N$ values we will be investigating close to the ODT, the system will typically get stuck in one of the free energy minima corresponding to either the disordered or ordered state. Let us assume that the simulation reaches equilibrium in the disordered phase, corresponding to the left minimum in figure 1a.

We then start to build up the bias potential, thereby making the well 'shallower' in order that the system might jump the barrier into the ordered phase. We do this by periodically adding Gaussians $$\begin{eqnarray} \frac{\delta U(\Psi)}{k_{B}T} &=& \exp \left( -\frac{U(\Psi)}{k_{B}\Delta T} \right) \\ &\times& \exp \left( -\frac{(\hat{\Psi} - \Psi)^{2}}{2 \sigma_{\Psi}^{2}} \right) \end{eqnarray}$$ of width $ \sigma_{\Psi} $ to $U(\Psi)$, centred about the current value of the order parameter, $ \hat{\Psi} $. The first exponential reduces the amplitude of the Gaussians throughout the course of the simulation, the rate of which is controlled by $ \Delta T $. To facilitate calculation of $ W_{-}({\bf r}; \tau + \delta \tau) $, each time we update $ U(\Psi) $ we also keep track of $ U'(\Psi) = \left( \frac{\mathcal{D}U(\Psi)}{\mathcal{D}\Psi} \right) $ by adding contributions of $$\begin{equation} \delta U'(\Psi) = \left( \frac{\hat{\Psi}-\Psi}{\sigma_{\Psi}^{2}} - \frac{U'(\Psi)}{k_{B}\Delta T} \right) \delta U(\Psi), \end{equation}$$ which we find to be more accurate than taking a numerical derivative of $ U(\Psi) $.

The orange lines in Figure 1b suggest how the bias potential gets built up over time, starting by filling the free energy minimum corresponding to the disordered phase. Once the bias potential has filled enough of the disordered well, the relative height of the central energy barrier becomes low enough for the system to hop into the adjacent minimum and explore the lamellar phase, as depicted in figure 1c. The process is then repeated, filling up the second free energy minimum until the system can again hop between phases. The system is considered well-tempered when the bias potential has filled both of the wells, leaving it free to transform between disorder and lamellae as in figure 1d.

For the well-tempered system, the bias potential can be used to calculate the free energy as a function of the order parameter via $$\begin{equation} F(\Psi; \chi_b) = -\left(\frac{T}{\Delta T}+1 \right)U(\Psi) + C, \end{equation}$$ where $ C $ is a constant. The additive constant does not affect derivatives of the free energy, which will be used to obtain a better estimate of the order-disorder transition...

Detection of Order-Disorder Transitions

Having calculated the free energy as a function of $ \Psi $ from a well-tempered system, it is then possible to obtain a histogram of the order parameter: $$\begin{equation} P(\Psi) \propto \exp \left( \frac{F(\Psi; \chi_b)}{k_{B}T} \right). \end{equation}$$ For our simulations with two equilibrium states: disordered and ordered, the histogram will be a bimodal distribution. In the case where the system has run at exactly $ \chi N = (\chi N)_{\textrm{ODT}} $, the area beneath the two peaks will be equal. Otherwise, the area under one of the peaks will be larger depending on whether the simulation was run slightly above or below $ (\chi N)_{\textrm{ODT}} $.

Rather than determining $ (\chi N)_{\textrm{ODT}} $ by running simulations at multiple $ \chi N $ values until the histogram peak areas match, the method of Ghasimakbari and Morse can be employed to linearly extrapolate $ F(\Psi; \chi_b) $ to: $$\begin{eqnarray} F(\Psi; \chi_b + \Delta \chi_{b}) &=& F(\Psi; \chi_b) \\ &+& \left( \frac{\partial F}{\partial \chi_{b}} \right) \Delta \chi_{b}, \end{eqnarray}$$ where $ \Delta \chi_{b} $ is small. The extrapolation works as long as the $ \chi_{b} $ of the simulation was close enough to the ODT to produce two peaks in the histogram of the order parameter (i.e., the simulation was able to sample both phases). By subsequently plotting a histogram of the extrapolated $ F(\Psi; \chi_b + \Delta \chi_{b}) $, $ (\chi_{b} N)_{\textrm{ODT}} $ can be identified by adjusting $ \Delta \chi_{b} $ until the peak areas match.

WTMD L-FTS Simulation Code Requirements

Broad Simulation Requirements

Use the code from the L-FTS project as the underlying engine.
Read an additional WTMD-specific input file to set parameters and optionally continue from a saved bias field.

Technical Simulation Requirements

Minimally alter the base L-FTS code by implementing WTMD with a class structure.
Use C/C++ (a low-level language) for fast execution.
Identify code that would benefit from GPU parallelisation.
Write custom CUDA kernels and execute them on the GPU.
Utilise CUDA's Thrust library to calculate reduction sums on the GPU.
Benchmark parallel GPU code against serial CPU code.

GPU Parallelised Code Areas

Subsections

Calculation of the Order Parameter, $ \Psi $
Updating the Bias Potential, $ U(\Psi) $
Calculating the Biasing Force
Modified Langevin Step

Calculation of the Order Parameter, $ \Psi $

Calculating the order parameter $$\begin{equation} \Psi = \frac{1}{R_{0}^{3}} \left( \frac{V}{(2 \pi)^{3}} \int f(k) \lvert W_{-}({\bf k}) \rvert^{l} d{\bf k} \right)^{1/l}, \end{equation}$$ requires a Fourier transform of the real-space field, $W_{-}({\bf r}) $, into reciprocal space to obtain $ W_{-}({\bf k}) $. A Fourier transform can be a costly operation to perform on the host (CPU), the execution time scaling with the number of mesh points used to represent the field. It may also be the case that the user wishes to output $ \Psi $ at every time step, so this relatively costly calculation could end up being performed frequently. As a result, I have used CUDA's CuFFT library to perform the Fourier transform on the GPU with the function:

cufftResult cufftExecD2Z(
   cufftHandle plan,
   cufftDoubleReal *idata,
   cufftDoubleComplex *odata);

The next area that can be parallelised is identified by noting that the equation for the order parameter contains an integral that becomes a summation on our discrete mesh: $$\begin{equation} \int f(k) \lvert W_{-}({\bf k}) \rvert^{l} d{\bf k} \rightarrow C \sum_{i} f(k_{i}) \lvert W_{-}({\bf k}_{i}) \rvert^{l}. \end{equation}$$ If we look at the contribution to the sum from a particular wavevector $ {\bf k}_{i} $, it's clear that neither $ f(k_{i}) $ or $ W_{-}({\bf k}_{i}) $ depend on any other wavevector, $ k_{j} $. As a result, we can divide up the hard work by calculating each contribution in parallel on the GPU. The next step would be to add the results from each $ k_{i} $ together again. We could achieve this by copying the array of results from the GPU back to the host and summing them with the CPU. However, copying a large array between host and GPU memory can become costly, especially if repeated often. As a result, I have utilised the CUDA Thrust library function:

OutputType thrust::transform_reduce(
   InputIterator first,
   InputIterator last,
   UnaryFunction unary_op,
   OutputType init,
   BinaryFunction binary_op).

Whilst I will not go through the function in detail, it takes an iterator specifying the first and last positions of the input arrays, a function that will be applied the the input data (in our case, it calculates the product $ f(k_{i}) \lvert W_{-}({\bf k}_{i}) \rvert^{l} $), the initial summation value (zero), and a reduction operator (i.e., the built in operation thrust::plus<double>() for summation). These parameters are used to perform a sum of the transformed input data on the GPU and finally return a single double precision variable (not a large array) to host memory. In the code, the function looks like:

Psi = thrust::transform_reduce(
   _z,
   _z + _Mk,
   Psi_calc(_ell, _M),
   0.0,
   thrust::plus<double>());

where _z is a thrust::zip_iterator<IteratorTuple> (not discussed here), _M and _Mk are constants corresponding to array lengths in real and reciprocal space and the function:

Psi_calc(double ell, int M)

calculates the product $ f(k_{i}) \lvert W_{-}({\bf k}_{i}) \rvert^{l} $, where ell is the constant $ l $. The steps outlined above are contained within the public function:

double get_Psi(double *w_gpu)

in the metadynamics class. It requires only a pointer to the field array stored on the GPU as a parameter.

Updating the Bias Potential, $ U(\Psi) $

It is necessary to periodically update the bias potential by an amount $ \delta U(\Psi) $, and its derivative by an amount $ \delta U'(\Psi) $. Furthermore, to perform a linear extrapolation of the free energy once the system has become well-tempered, we also need to calculate the ensemble average of $ W_{-}^{2}({\bf r}) $ as a function of $ \Psi $ (mathematics not shown). This is done by adding Gaussians to two functions, $ I_{0}(\Psi) $ and $ I_{1}(\Psi) $, centred about the current value of the order parameter, $ \hat{\Psi} $. The Gaussians added to $ I_{1}(\Psi) $ are scaled by an amount: $$\begin{equation} \hat{\langle W_{-}^{2} \rangle} = \frac{1}{M} \sum_{i=1}^{M} W_{-}^{2}({\bf r}_{i}) \end{equation}$$ compared to those added to $ I_{0}(\Psi) $. Since the number of mesh points, $ M $, is large this is another sum that can be calculated on the GPU via thrust::transform_reduce(...) (click here for more details). This time, we use the built-in operation, thrust::square<double>() to perform the square.

The next step is to update $ \delta U(\Psi) $, $ \delta U'(\Psi) $, $ \delta I_{0}(\Psi) $ and $ \delta I_{1}(\Psi) $ as the Gaussians are added. Depending on the range of $ \Psi $ being investigated and the resolution (i.e., the distance between adjacent points, $ \Delta \Psi $ ), this is another procedure that can be significantly sped up through parallelisation. It is possible as long as each GPU thread knows the value about which to centre the Gaussians, $ \hat{\Psi} $, and can work out which value of $ \Psi $ it corresponds to itself - a relatively simple couple of requirements to satisfy. This is achived in the CUDA kernel:

__global__ void update_bias_fields_gpu(...),

into which $ \hat{\langle W_{-}^{2} \rangle} $ and $ \hat{\Psi} $ are passed as parameters (among others). All four of the aforementioned arrays are updated in the above kernel, since $ \delta U'(\Psi) $ necessiates calculating $ \delta U(\Psi) $, and both $ I_{0}(\Psi) $ and $ I_{1}(\Psi) $ can reuse the Gaussian contribution already being used in the thread.

Finally, the steps discussed above are all performed by calling either of the overloaded functions:

void update_bias_field(double *w_gpu),

void update_bias_field(double Psi_hat,
double *w_gpu),

depending on whether $ \hat{\Psi} $ has already been calculated prior to the call (thereby saving an additional call to get_Psi(...)).

Calculating the Biasing Force

The first step in calculating the biasing force for the Langevin step is to obtain the current value of the order parameter $ \hat{\Psi} $ with a call to the parallelised function: double get_Psi(w_gpu) (described above). We then use $ \hat{\Psi} $ to look up the derivative of the bias potential that we have been keeping track of and obtain $ U'({\hat{\Psi}}) $.

The next step is to calculate the functional derivative, $ \mathcal{D} \Psi / \mathcal{D} W_{-}({\bf r}) $ (see Beardsley and Matsen for mathematical details). We begin by calculating $ \mathcal{D} \Psi / \mathcal{D} W_{-}({\bf k}) $ on the GPU with a call to the kernel: __global__ void get_dPsi_dwk(...), the result of which is Fourier transformed back into real space via the cuFFT function: cufftExecZ2D(...) to give $ \mathcal{D} \Psi / \mathcal{D} W_{-}({\bf r}) $.

A final call to the GPU kernel: __global__ aEqBc(...), leaves is with the biasing force, $$\begin{equation} F_{B}(\Psi) = U(\hat{\Psi}) \left( \frac{\mathcal{D} \Psi}{\mathcal{D} W_{-}({\bf k})} \right) \end{equation}$$

Modified Langevin Step

The WTMD simulation requires a modified Langevin step in which the biasing force is added to the update of $ W_{-}({\bf r}) $. Since a simulation will often start with zero bias potential (and therefore no bias force), a flag has been introduced as an argument to the Langevin function that allows the user to turn off the call to get_fBias(...) if it's not required:

void langevin(..., bool useBias = true).

This will stop wasteful kernel calls and Fourier transforms being used to determine that zero bias potential results in zero bias force. In the case where the biasing force does not need to be calculated, $ W_{-}({\bf r}) $ is updated using a standard L-FTS Langevin step on the GPU via the kernel: __global__ langevin_sym(...). Otherwise, after getting the biasing force, $ F_{B} (\Psi) $, on the GPU by calling get_fBias(...), it is passed to a CUDA kernel to update $ W_{-}({\bf r}) $ via a modified Langevin step:

__global__ langevin_sym_bias(..., double *bias).

GPU vs Serial Code Benchmarking

Content coming soon!

GPU WTMD L-FTS Code

The GPU code for this project can be found on my github page at: https://github.com/tmbeardsley/wtmd_lfts_gpu/.

The corresponding CPU code is located at: https://github.com/tmbeardsley/wtmd_lfts_cpu/

Well-Tempered Metadynamics (WTMD) Applied to Field-Theoretic Simulations of Diblock Copolymers