-
The traditional variational method for quantum mechanics [30, 31] usually uses a given function with unknown parameters as a variational function, e.g.,
${\rm e}^{-\alpha r}$ with α as the unknown parameter. Different from the previous variational artificial neural network (VANN) applications [23, 32], we do not use the DNN to represent the wave function directly; instead, we use the DNN to represent the cumulative distribution function (CDF), which is the integration of the probability density function on the spatial coordinate. The training objective is thus to minimize the violation of the wave function$ \psi(x) $ represented by the neural network to the Schrodinger equation. Its relationship with the wave function is expressed as follows:$ F(x) =\int_{-\infty}^{x} \psi^{\ast } (x')\psi(x'){\rm d} x', $
(1) $ \psi(x)=\sqrt{\frac{{\rm d}F(x)}{{\rm d}x} }, $
(2) where
$ F(x) $ is the CDF represented by a neural network that is monotonic by construction. We decided to constrain the weights to make the algorithm data efficient. The derivative${\rm d}F/{\rm d}x$ is calculated using auto differentiation (auto-diff) [33], which is provided by the deep learning libraries automatically, in analytical precision. There are two advantages of using the CDF. First, the wave function extracted from the CDF automatically satisfies the normalization condition. Thus, there is no numerical integral in the whole calculation. Second, the values of the CDF are between$ (0, 1) $ , a much smaller range than that of the PDF, making the neural network much easier to train under the same learning rate. In practice, our training epochs are far fewer than those in the previous algorithm. Moreover, because we eliminate all the integrals, each epoch requires less computation than the previous algorithms. Therefore, our algorithm can achieve higher accuracy with less computation.We use a feed forward neural network, or simply a multi-layer perceptron, to represent the CDF. The input of the neural network is the n-dimensional spatial coordinates x, and the first layer of the DNN consists of
$ m=32 $ hidden neurons whose values are calculated by$ h_1 = \sigma(x W_1 + b_1) $ , where$ W_1 $ is the weight matrix with$ n\times m $ elements and$ b_1 $ is the bias vector with m values. σ is the activation function, which provides the neural network with non-linear representation ability. To increase the representation power, the values of neurons in the first hidden layer are feed forward to the second hidden layer with similar operations$ h_2 = \sigma(h_1 W_2 + b_2) $ . One can stack multiple hidden layers with one output neuron in the last layer to represent the value of the CDF function. The whole neural network can thus be thought of as one variational function$ F(x, \theta) $ with θ being all the trainable parameters in the neural network.To ensure that
$ F(x, \theta) $ is monotonic, we add a non-negative constraint to the weights$ W_i $ of the neural network. At the same time, the activation function should also be monotonic. In principle, sigmoid, tanh, as well as leaky relu functions can all be used to construct this monotonic neural network. In practice, we use a sigmoid activation function whose derivatives are also continuous. This is important when the second order derivatives are required in auto-diff. For example, if one uses a relu activation function, the second order derivatives of the output of the neural network to the input equal 0. The last layer also uses a sigmoid activation function to ensure that the output range is$ (0, 1) $ .The training objective is to find the ground state energy
$ E_0 $ and its corresponding wave function$ \psi_0 $ by minimizing the violation of the wave function$ \psi(x) $ to the following Schrodinger equation:$ \begin{align} H |\psi \rangle = E_0 |\psi \rangle, \end{align} $
(3) where
$ H = -\dfrac{\hbar^2}{2m} \nabla^2 + V({\bf x}) $ is the Hamiltonian operator and$ E_0 $ represents its smallest eigenvalue. The loss function is thus set to be$ \begin{equation} L(\theta) =|(H-E_{0})|\psi\rangle +|F(x_{\rm min})|+|F(x_{\rm max})-1|, \end{equation} $
(4) where θ represents all the trainable parameters in the monotonic neural network,
$ \nabla^2 |\psi \rangle $ is computed by the neural network through auto-diff, and$ E_0 $ is another trainable parameter initialized with a constant number,$ 0.0 $ . Two additional loss terms are added to take into account the boundary condition of the CDF. We use it to limit the range of values of the CDF, which ensures that the wavefunction satisfies the normalization condition. In previous VANN applications, this term was written as$ <\psi|H|\psi>/<\psi|\psi> $ . We replace the numerical integration of the denominator with soft constraints, which simplifies the calculation.Before optimization, the weight values of the neural network parameters are usually initialized randomly or through the Xavier scheme [34]. In our problem, we observe that the scheme of parameter initialization has little influence on the training process and the result of variational optimization.
We attempt to eliminate all the numerical integrals in the whole calculation because, in neural network calculations, the differential is easier to calculate than the integral. To calculate the integral, we have to use numerical approximation methods such as Monte Carlo sampling, which will certainly increase the amount of computation and may affect the accuracy. In practice, we found that we had to subtract the energy term if we did not want to include the integral.To find the ground state energy and wave function, we add another loss term
${\rm e}^{0.001E_0}$ . The basic logic is to decrease$ E_0 $ during the optimization of the overall loss. The function form of this loss term is designed to produce proper negative gradients at different stages of training. First, the gradients should be sufficiently large to make it converge fast when$ E_0 $ is much larger than the ground state energy. Second, the gradients should be small enough to avoid interfering with the optimization of other parts of overall loss when$ E_0 $ is sufficiently close to the true value. Last but not least, this loss term must monotonously increase with$ E_0 $ throughout the definition domain. In principle,$ E_0 $ can be smaller than the analytical ground state energy; however, in that case, the residual of the Schrodinger equation increases faster than this term because of the coefficient$ 0.001 $ . After trained, the value of$ E_0 $ approaches the exact value of the ground state energy.We tested the performance of the DNN Schrodinger equation solver on three classical quantum mechanical problems. The first problem is the harmonic oscillator problem [35]. The harmonic oscillator is used to approximate, for example, the molecular vibration, lattice vibration, and radiation field vibration around a steady point. All of these problems can be regarded as many independent harmonic oscillators whose potential in the Hamiltonian can be written as
$ \begin{align} V=\frac{1}{2}m\omega ^{2} x^2, \end{align} $
(5) where m is the mass of the oscillator, ω is its angular frequency, and x is its deviation from equilibrium position.
The second problem is to solve the Schrodinger equation with the Woods-Saxon potential [36], which is widely used in nuclear physics to represent the charge distributions of the nucleus, as follows:
$ \begin{equation} V=\frac{-1}{1+{\rm e}^{\frac{\left | x \right |-R_{0} }{a_0} } }, \end{equation} $
(6) where
$ a_0 $ is related to the thickness of the surface layer, in which the potential drops from the outside to the inside of the nucleus, and$ R_0 $ is the average radius of the nucleus at which the average interaction occurs.The third potential is an infinitely high potential well with a width of
$ 2 l $ ,$ \begin{equation} V=\left\{ \begin{aligned} & \infty, & \; & |x| >l \\ & 0, & \; & |x| \le l \\ \end{aligned} \right. \end{equation} $
(7) For the sake of brevity, the parameters in Hamiltonian use the following values:
$ \begin{align} \hbar = m = \omega = 1, ~~ R_0 = 6.2, ~~ a_0 =0.1,~~ l = 4. \end{align} $
(8) Different from previous studies that solve Schrodinger equations using supervised learning, our method is close to unsupervised learning, where both
$ E_0 $ and$ \psi_0 $ are learned through optimization. The input to the neural network is a list of shuffled coordinates sampled from the domain. Using these coordinates, we compute the loss functions and minimize the violation of$ E_0 $ and$ \psi_0 $ to the Schrodinger equation, as well as$ {\rm e}^{0.001E_0} $ . In principle, we can use the Markov chain Monte Carlo (MCMC) method [37] to sample coordinates with the learned wave function or use active learning to generate coordinates that violate Schrodinger equations more with the currently learned network to speed up the training process. In practice, for these simple problems, the wave function is usually very close to the exact wave function after training the DNN for 2000 iterations. We generally train 10000 iterations with a very small learning rate for the last 1000 iterations.We use tensorflow [38] to construct the DNN, to compute the auto-diff
${\rm d}F/{\rm d}x$ as well as$ \nabla^2 \psi $ , and to update the network parameters. We use the Adam algorithm [39], which adds the momentum mechanism and adaptive learning rate to the simple stochastic gradient descent$ \theta_{n+1} = \theta_n - lr \dfrac{1}{m} \sum_{i=1}^m \dfrac{\partial l_i}{\partial \theta} $ . The relevant parameters in the Adam algorithm are set to$ \beta_1=0.9,\; \beta_2 = 0.999, \epsilon=10^{-7} $ . To speed up the training process, we use a learning rate scheduler to adjust the learning rate and make it vary in the$ 10^{-2}-10^{-5} $ range. A large learning rate at an early stage makes the function converge faster at the beginning, and a small learning rate at a late time makes the training process smooth.To quantify the difference between the true wave function
$ \psi_{\rm true} $ and the wave function learned by the DNN$ \psi_{\rm DNN} $ , we introduce the partial-wave fidelity K as follows:$ \begin{equation} K=\frac{< \psi_{\rm true}|\psi_{\rm DNN} >^2}{< \psi_{\rm true}|\psi_{\rm true}><\psi_{\rm DNN}|\psi_{\rm DNN} >}. \end{equation} $
(9) The closer K is to one, the closer is the result of the DNN to the exact wave function.
-
For the harmonic oscillator potential used in the present work, the analytical ground state energy is
$ 0.5\hbar\omega $ . After 1500 iterations of training, the ground state energy from DNN is$ E_0 = 0.50038 \hbar\omega $ , whose relative error is within$ 0.06\% $ . In the last stage of$ 10000 $ iterations, the error can be controlled below$ 0.002\% $ .The partial-wave fidelity K is approximately 0.997993 using three hidden layers with 32 units (neurons) per layer. To study the influence of the number of variational parameters on the training results, we computed K using different numbers of hidden layers and numbers of units per layer. The results are shown below.
As presented in Table 1, the highest fidelity is achieved using 4 hidden layers with 16 hidden neurons per layer, with
$ K=0.9999967 $ . For this simple problem, the DNN achieves a very low error in fidelity even with only two hidden layers and four units per layer. The performance increases with the number of variational parameters, approaching the exact wave function. However, the performance of the DNN saturates or even decreases if there are too many variational parameters.$N_{\rm unit}N_{\rm layer}$ 1 2 3 4 4 0.9995717 0.9999767 0.9999705 0.9999618 8 0.9999416 0.9999797 0.9999910 0.9999932 16 0.9999861 0.9999923 0.9999936 0.9999967 32 0.9999789 0.9999909 0.9999896 0.9999922 64 0.9999744 0.9999746 0.9999903 0.9999941 Table 1. Fidelities of VANN results; the first row represents the number of hidden layers, and the first column represents the number of units in each layer.
To further visualize the difference between the result of the DNN and the exact solution, we compare the CDF in Fig. 1.
Figure 1. (color online) Cumulative distribution equation as a function of position in the harmonic oscillator problem.
Using Eq. (2), we compute
$ \psi(x) = \sqrt{{\rm d}F/{\rm d}x} $ and compare the wave function$ \psi(x) $ and its first and second derivatives$ \dfrac{{\rm d}\psi}{ {\rm d}x} $ and$ \dfrac{{\rm d}^2 \psi}{{\rm d}x^2} $ with the analytical results. This provides a detailed comparison that shows the power of the variational function represented by the DNN.As shown in Fig. 1, the difference between the DNN results and the ground truth is within an error range of 0.0001. The error range of the ground state wave function can be controlled within 0.0002, as shown in Fig. 2. In addition, the accuracies of the learned first and second order derivatives obtained through variational optimization are also very high, which means that this method is not only accurate, but also captures the true physics instead of finding an approximation function for the ground state wave function.
Figure 2. (color online) Ground state wave function (top panel), first derivative (central), and second derivative (bottom) as a function of position in the harmonic oscillator problem.
The same network is used to solve Schrodinger equations with the Woods-Saxon potential and the infinitely high potential well. No modification is made to the hyperparameters other than the potential part in the Hamiltonian. The comparisons between the learned ground state wave functions and the true values are shown in Fig. 3 and Fig. 4. The result shows that the ground state wave functions obtained from the DNN CDF are also in excellent agreement with the exact solution. The error range of the Woods-Saxon potential's ground state wave function can be controlled within 0.0002. As shown in Fig. 4, the performance of the network on the infinitely high potential well is relatively poor, and the range of error is expanded to 0.02, which is much worse than that in the harmonic oscillator potential and Woods-Saxon potential. This is also reflected in the calculation of the ground state energy and partial-wave fidelity K. The ground state energy calculated by the DNN for the Woods-Saxon potential problem is –0.97382, also within an error of
$ 0.002\% $ to the exact result –0.97385, and$ K=0.999964 $ , similar to that for the harmonic oscillator potential. However, for the potential well problem, the DNN$ E_0 = 0.07787 $ , whose relative error to the accurate result of 0.07710 is approximately$ 1\% $ , and$ K= $ 0.9977475, also less than the average level of the first two potentials. It reminds us that this DNN might not perform good at dealing with potentials with discontinuities, like the infinitely high potential well, whose potential energy discontinuously changes from zero to infinity at the boundary. We think that this is due to the fact that the soft constraint cannot handle the infinite potential energy at the boundary well, so the probability density at the boundary is not 0.
Solving Schrodinger equations using a physically constrained neural network
- Received Date: 2023-01-01
- Available Online: 2023-05-15
Abstract: Deep neural networks (DNNs) and auto differentiation have been widely used in computational physics to solve variational problems. When a DNN is used to represent the wave function and solve quantum many-body problems using variational optimization, various physical constraints have to be injected into the neural network by construction to increase the data and learning efficiency. We build the unitary constraint to the variational wave function using a monotonic neural network to represent the cumulative distribution function (CDF)