-
In the kernel ridge regression approach [62, 91, 92], the predicting function can be written as
$ f(\mathit{\boldsymbol{x}}_{j}) = \sum\limits_{i=1}^{m}K(\mathit{\boldsymbol{x}}_{j},\mathit{\boldsymbol{x}}_{i})\omega_{i}, $
(1) where
$ \mathit{\boldsymbol{x}}_{i}, \mathit{\boldsymbol{x}}_{j} $ are input variables, m represents the number of training data,$ K(\mathit{\boldsymbol{x}}_{j},\mathit{\boldsymbol{x}}_{i}) $ is the kernel function, and$ \omega_{i} $ is the corresponding weight that is determined by training. The kernel function$ K(\mathit{\boldsymbol{x}}_{j},\mathit{\boldsymbol{x}}_{i}) $ measures the similarity between data and plays a crucial role in prediction. There are many choices for the kernel function, such as linear kernels, polynomial kernels, and Gaussian kernels. The commonly used Gaussian kernel is written as$ \begin{array}{*{20}{l}} K(\mathit{\boldsymbol{x}}_{j},\mathit{\boldsymbol{x}}_{i}) = \mathrm{exp}\left[-\dfrac{\|\mathit{\boldsymbol{x}}_{j}-\mathit{\boldsymbol{x}}_{i}\|^{2}}{2\sigma^{2}}\right], \end{array} $
(2) where
$ \|\cdot\| $ denotes the Euclidean norm, and the hyperparameter$ \sigma > 0 $ , defines the length scale of the distance that the kernel affects.The weight
$ \mathit{\boldsymbol{\omega}}=(\omega_{1},\cdots,\omega_{m})^{T} $ is obtained by minimizing the loss function$ L(\mathit{\boldsymbol{\omega}}) = \sum\limits_{i=1}^{m}[f(\mathit{\boldsymbol{x}}_{i})-y(\mathit{\boldsymbol{x}}_{i})]^{2} + \lambda \mathit{\boldsymbol{\omega}}^{T}\mathit{\boldsymbol{K}}\mathit{\boldsymbol{\omega}}, $
(3) where
$ \mathit{\boldsymbol{K}}_{ij}=K(\mathit{\boldsymbol{x}}_{i},\mathit{\boldsymbol{x}}_{j}) $ is the kernel matrix. The first term is the variance between the KRR predictions$ f(\mathit{\boldsymbol{x}}_{i}) $ and the data$ y(\mathit{\boldsymbol{x}}_{i}) $ . The second term is a regularizer that reduces the risk of overfitting, and$ \lambda\geqslant 0 $ is a hyperparameter that determines the strength of the regularizer. By minimizing the loss function, one can obtain the weight as follows:$ \begin{array}{*{20}{l}} \mathit{\boldsymbol{\omega}} = \left(\mathit{\boldsymbol{K}}+\lambda \mathit{\boldsymbol{I}}\right)^{-1}\mathit{\boldsymbol{y}}, \end{array} $
(4) where
$ \mathit{\boldsymbol{I}} $ is the identity matrix.In predicting nuclear masses, the input variables are the proton and neutron numbers of the nucleus, i.e.,
$ \mathit{\boldsymbol{x}}=(Z,N) $ , and the output variable$ f(\mathit{\boldsymbol{x}}) $ is the prediction of nuclear mass. The corresponding Gaussian kernel function is [62]$ \begin{array}{*{20}{l}} K(\mathit{\boldsymbol{x}}_{j},\mathit{\boldsymbol{x}}_{i}) = \mathrm{exp}\left[-\dfrac{(Z_{j}-Z_{i})^{2}+(N_{j}-N_{i})^{2}}{2\sigma^{2}}\right]. \end{array} $
(5) Later, a remodulated kernel function was designed to include the odd-even effects [63]:
$ \begin{aligned}[b] K(\mathit{\boldsymbol{x}}_{j},\mathit{\boldsymbol{x}}_{i}) =& \mathrm{exp}\left[-\dfrac{(Z_{j}-Z_{i})^{2}+(N_{j}-N_{i})^{2}}{2\sigma^{2}}\right] \\&+ \frac{\lambda}{\lambda_{\mathrm{oe}}} \delta_{\mathrm{oe}}\mathrm{exp}\left[-\dfrac{(Z_{j}-Z_{i})^{2}+(N_{j}-N_{i})^{2}}{2\sigma_{\mathrm{oe}}^{2}}\right], \end{aligned} $
(6) where
$\sigma,~\lambda,~\sigma_{\mathrm{oe}}$ , and$ \lambda_{\mathrm{oe}} $ are hyperparameters to be determined, and$ \delta_{\mathrm{oe}}=1 $ only when$ (Z_{i},N_{i}) $ have the same parity with$ (Z_{j},N_{j}) $ ; otherwise,$ \delta_{\mathrm{oe}}=0 $ . The second term is the odd-even term, which was introduced to enhance the correlations between nuclei that have the same parity of proton and neutron numbers [63]. The hyperparameters$ \sigma_{\rm oe} $ and$ \lambda_{\rm oe} $ play similar roles of σ and λ but are related to the odd-even term. It should be noted that here the odd-even effects are included by remodulating the kernel function, which does not increase the number of the weight parameters. This is different from the ways to introduce odd-even effects by building and training additional networks in Ref. [58], or by adding additional inputs, as reported in Ref. [50]. -
With the optimized hyperparameters
$ (8.7,3.4\times10^{-6}) $ , the KRR approach with the Gaussian kernel is used to predict the nuclear binding energy of an arbitrary nucleus with the binding energies of all the other nuclei used for training, i.e., by means of the leave-one-out method. It is found that when the RCHB mass table is learned [29], the binding energies of 9035 nuclei obtained via the KRR predictions have an rms deviation of$ \sigma_{\mathrm{rms}}= $ 0.96 MeV with respect to the data in the RCHB mass table. This deviation is <1 MeV and is comparable to the typical values obtained by directly learning the experimental mass data with other machine learning methods such as feedforward neural networks [35–37, 39, 40, 77], a support vector machine [38], and mixture density networks [41, 45]. Thus, overall satisfactory performance of learning the RCHB mass table can be achieved via the KRR method with the Gaussian kernel. Similarly, with the optimized hyperparameters$ (2, 10^{-2}, 20, 10^{-7}) $ , the KRRoe approach is used to predict nuclear binding energies via the leave-one-out method. The corresponding rms deviation is$ \sigma_{\mathrm{rms}}=0.17 $ MeV, which marks a significant improvement on the KRR approach and is comparable to the accuracy of 0.13 MeV reported in Ref. [63]. In the following, several physical effects or derived quantities that embedded in nuclear binding energies, such as the shell effects, nucleon separation energies, odd-even mass differences and residual proton-neutron interactions, will be analysed based on the KRR (KRRoe) results and compared to the corresponding behaviors of the RCHB mass table, in order to evaluate the ability of the machine learning tool to grasp physics. -
As is well known, the shell effect is a quantum effect that plays an essential role in determining many properties of finite nuclear systems [94]. In the nuclear landscape, it is mostly reflected by the fact that a nucleus with some special proton numbers and/or neutron numbers, i.e., magic numbers, is far more stable than its neighboring nuclei. One reasonable way to quantitatively measure the shell effects over the nuclear landscape [94] is to examine the differences between nuclear masses and the values given by a phenomenological formula of the liquid drop model (LDM), such as the Bethe-Weizsäcker (BW) formula [3, 4], because the LDM regards an atomic nucleus as a classical liquid drop and does not include the shell effects. To this end, we first refit the BW formula to all 9035 data in the RCHB mass table and then analyse the shell effects by examining the mass differences between the BW formula and the RCHB mass table as well as the KRR predictions.
By fitting the 9035 data of the RCHB mass table, the five parameters of the BW formula are obtained, i.e., the volume coefficient
$a_{{V}}=13.64$ MeV, the surface coefficient$a_{{S}}=10.26$ MeV, the Coulomb coefficient$a_{{C}}=0.61$ MeV, the symmetry coefficient$a_{{A}}=19.78$ MeV, and the pairing coefficient$a_{{p}}=18.08$ MeV. In Fig. 2(a), the differences in the binding energies, i.e.,$E_{{b}}^{\mathrm{RCHB}}-E_{{b}}^{\mathrm{BW}}$ , between the RCHB mass table and the fitted BW formula are shown. The corresponding rms deviation$ \sigma_{\mathrm{rms}} $ is 10.18 MeV, which is about three times of the rms deviation 3.10 MeV obtained by fitting the BW formula to experimental data [2]. This results from the weakness of the BW formula for describing shell effects and nuclei away from the line of stability, whereas the RCHB mass table has larger shell effects and more nuclei away from the line of stability than the experimental data. In Fig. 2(a), some systematic differences in the features can be seen: (1) Comparing the nuclear regions near the magic numbers with those away from the magic numbers, it is shown that the deviations of the RCHB mass table from the fitted BW formula in the former regions are in general larger than the adjacent latter regions. This means the microscopic RCHB mass table gives more binding energies for nuclei with and close to magic numbers, which are the direct consequences of the shell effects. (2) In the relatively light$ A\leqslant 80 $ and considerably heavy$ A\geqslant 300 $ mass regions, the binding energies of the RCHB mass table are systematically lower than those of the fitted BW formula. (3) Near the neutron dripline regions, almost all the data in the RCHB mass table are more bound than the fitted BW formula. This is because the symmetry energy term$ -a_{A}(N-Z)^{2}/A $ in the BW formula plays a critical role in determining the binding energies of neutron-rich nuclei, whereas in the RCHB mass table, there are often many nuclei close to the neutron dripline having almost the same binding energies owing to considerations of pairing correlations and the continuum effect [29].Figure 2. (color online) (a) Differences in the binding energies, i.e.,
$E_{{b}}^{\mathrm{RCHB}}-E_{{b}}^{\mathrm{BW}}$ , between the RCHB mass table [29] and the BW formula [3, 4] whose parameters are determined by fitting 9035 data from the RCHB mass table. The corresponding rms deviation$ \sigma_{\mathrm{rms}} $ is also given. The magic numbers are indicated by the vertical and horizontal dotted lines. (b, c) Same as (a) but for the KRR and KRRoe predictions.The differences in the binding energies of the RCHB, KRR, and KRRoe predictions with respect to the BW formula are shown in Fig. 2. Compared with Fig. 2(a), the three systematic features in Fig. 2(a) mentioned above are well reproduced in Fig. 2(b) and Fig. 2(c). In particular, the shell effects reflected in nuclear masses have been learned effectively by the KRR and KRRoe methods. It should be noted that although the KRR predictions catch the systematic features in Fig. 2(a), they lose some details in the differences; i.e., the differences between the RCHB mass table and the BW formula vary smoothly with respect to the nucleon number, whereas the differences between the KRR predictions and the BW formula exhibit grain structures. These details are better described by the KRRoe predictions, and no grain structure is seen in Fig. 2(c).
-
The nucleon separation energies, such as one- and two-nucleon separation energies, are first-order differential quantities of binding energies that can clearly show the shell effects. They also define the positions of nucleon drip lines. In this subsection, we compare the separation energies extracted from the KRR predictions with the optimized hyperparameters and those from the RCHB mass table.
The one-neutron separation energies, i.e.,
$S_{{n}}(Z,N)= E_{{b}}(Z,N)- E_{{b}}(Z,N-1)$ , extracted from the RCHB mass table and the KRR predictions are shown in Fig. 3(a) and Fig. 3(b), respectively. As summarized in Ref. [29] and shown in Fig. 3(a), the main features of$S_{{n}}$ from the RCHB mass table are as follows: (1) For a given isotopic (isotonic) chain,$S_{{n}}$ decreases (increases) with increasing neutron (proton) number; (2) Significant reductions in$S_{{n}}$ exist at the traditional magic numbers$N = 20,~28, ~50, ~82,~126$ as well as$ N=184 $ , indicating the shell closures; (3) because of the pairing correlations,$S_{{n}}$ has a ragged evolution pattern with the variation of the neutron number, which zigzags between even- and odd-A nuclei.Figure 3. (color online) The one-neutron separation energies extracted from the RCHB mass table (a), in comparison with those from the KRR (b) and KRRoe (c) predictions. The values are scaled by colors.
As shown in Fig. 3(b) and Fig. 3(c), the
$S_{{n}}$ extracted from the KRR and KRRoe predictions also decreases (increases) with the increasing neutron (proton) number for a given isotopic (isotonic) chain, and there are drops at the traditional magic numbers$ N = 20, 28, 50, 82,126 $ as well as$ N=184 $ . However, in Fig. 3(b), these drops are not as sharp as those of the RCHB mass table. More obviously, the ragged evolution pattern in$S_{{n}}$ due to the pairing correlations is fully absent in the KRR results. For the KRRoe predictons in Fig. 3(c), owing to the explicit inclusion of the odd-even effects, the sharp drop at the magic number and the ragged evolution pattern are well reproduced.The two-neutron separation energies, i.e.,
$S_{{2n}}(Z,N)= E_{{b}}(Z,N)- E_{{b}}(Z,N-2)$ , extracted from the RCHB mass table and the KRR (KRRoe) predictions are shown in Fig. 4, where half of the$S_{{2n}}$ is scaled by colors in order to use the same scale as the one-neutron separation energies in Fig. 3. Similar to the$S_{{n}}$ , the$S_{{2n}}$ decreases (increases) with the increasing neutron (proton) number for a given isotopic (isotonic) chain, and there are significant drops at neutron shell closures. However, in contrast to the$S_{{n}}$ , the$S_{{2n}}$ does not show a ragged evolution pattern with the variation of neutron number, as the$S_{{2n}}$ is obtained by the difference between two nuclei with the same number parity. Furthermore, the nuclear landscape shown in Fig. 4 is slightly broader than that shown in Fig. 3 because of the differences between the two-neutron and one-neutron dripline nuclei.Figure 4. (color online) The two-neutron separation energies extracted from the RCHB mass table (a), in comparison with those from the KRR (b) and KRRoe (c) predictions. To have a better comparison with one-neutron separation energies in Fig. 3, here half of the two-neutron separation energies are shown and scaled by colors.
As shown in Fig. 4(b) and Fig. 4(c), the
$S_{{2n}}$ extracted from both the KRR and KRRoe predictions also decreases (increases) with the increasing neutron (proton) number for a given isotopic (isotonic) chain, and there are drops at neutron shell closures. Similar to the$S_{{n}}$ , the KRRoe method predicts these drops to be sharp, in accordance with the RCHB mass table but not the KRR method. Comparing Fig. 3 with Fig. 4, one can easily find that the learning performance of the$S_{{2n}}$ using the KRR function is better than that of the$S_{{n}}$ . Quantitatively, the rms deviation of the KRR predictions with respect to the RCHB mass table is 0.49 MeV for the$S_{{2n}}$ , compared with 1.12 MeV for the$S_{{n}}$ . For the KRRoe predictions, the corresponding rms deviations of$S_{{2n}}$ and$S_{{n}}$ are 0.18 and 0.24 MeV, respectively, both of which exhibit a significant improvement with respect to the KRR one.According to the analyses of the one- and two-neutron separation energies, one can conclude that although the present KRR method with a Gaussian kernel can capture the inherent shell effects, the ragged evolution pattern of
$S_{{n}}$ coming from the pairing correlations seems beyond its capability. In contrast, the KRRoe method can capture not only the inherent shell effects but also the ragged evolution pattern of$S_{{n}}$ . Similar analyses have been carried out for the one- and two-proton separation energies, and similar conclusions can be drawn. -
Pairing correlations are essential and allow us to understand many important effects that cannot be explained within a pure Hartree(-Fock) picture [95]. In the RCHB theory, the pairing correlations as well as the continuum effects are well taken into account by the Bogoliubov transformation in the coordinate representation [87, 88]. As discussed above, it is necessary to investigate the impacts of pairing correlations in detail. A direct consequence of the pairing correlations is the odd-even effects in binding energies [95], which are measured by the odd-even mass differences:
$ \begin{array}{*{20}{l}} \begin{aligned} \Delta_{{n}}^{(3)}(Z,N)&=[E_{{b}}(Z,N+1)+E_{{b}}(Z,N-1)-2E_{{b}}(Z,N)]/2,\\ \Delta_{{p}}^{(3)}(Z,N)&=[E_{{b}}(Z+1,N)+E_{{b}}(Z-1,N)-2E_{{b}}(Z,N)]/2. \end{aligned} \end{array} $
(7) Figure 5 shows the odd-even mass differences of isotopic chains
$\Delta_{{n}}^{(3)}$ extracted from the RCHB mass table, in comparison with those from the KRR (KRRoe) predictions. From Fig. 5(a), it is obvious that the odd-even mass differences for neighboring nuclei are opposite in sign, clearly reflecting the odd-even effects in the RCHB mass table. In general, the odd-even mass differences of nuclei in heavier mass regions are smaller in amplitude than those in lighter regions; the trend is consistent with the empirical$ 12\cdot A^{-1/2} $ relation of the pairing gap [95]. However, as shown in Fig. 5(b), the odd-even mass differences extracted from the KRR predictions vanish globally, although there are still some fluctuations around magic numbers. In other words, the odd-even effects in the RCHB mass table are barely learned by the KRR method with a Gaussian kernel. This is because the binding energies oscillate rapidly with increasing numbers of neutrons or protons (i.e. the odd-even effects) that the KRR method with a smooth Gaussian kernel cannot describe. In contrast, as seen in Fig. 5(c), owing to the explicit inclusion of the odd-even effects, the KRRoe method predicts the same pattern of odd-even mass differences as the RCHB mass table. -
As shown above, the shell effects in the RCHB mass table commonly have an energy change of more than 10 MeV between a magic nucleus and its mid-shell isotopes, whereas the odd-even mass differences are usually larger than or near 1 MeV in amplitude. It is therefore not difficult to understand why the shell effects are learned effectively but the odd-even effects are barely learned by the KRR method, as the KRR predictions have an rms deviation
$ \sigma_{\mathrm{rms}}\approx 1 $ MeV with respect to the RCHB mass table. Intuitively, the effects that less than 1 MeV can not be learned by the KRR predictions either, and it is interesting to examine whether this intuition holds true.The empirical proton-neutron interactions of the last proton(s) with the last neutron(s)
$\delta V_{{pn}}$ , which are mostly below 1 MeV, provide an opportunity to examine this [96, 97]. They can be derived from the double binding energy differences of four neighboring nuclei [97], e.g.,$ \begin{array}{*{20}{l}} \delta V_{{pn}}(Z,N)= \begin{cases} \dfrac{1}{4}\{[E_{{b}}(Z,N) - E_{{b}}(Z,N-2)] - [E_{{b}}(Z-2,N) - E_{{b}}(Z-2,N-2)]\},& \mathrm{e-e}\\ \dfrac{1}{2}\{[E_{{b}}(Z,N) - E_{{b}}(Z,N-2)] - [E_{{b}}(Z-1,N) - E_{{b}}(Z-1,N-2)]\},& \mathrm{o-e}\\ \dfrac{1}{2}\{[E_{{b}}(Z,N) - E_{{b}}(Z,N-1)] - [E_{{b}}(Z-2,N) - E_{{b}}(Z-2,N-1)]\},& \mathrm{e-o}\\ [E_{{b}}(Z,N) - E_{{b}}(Z,N-1)] - [E_{{b}}(Z-1,N) - E_{{b}}(Z-1,N-1)].& \mathrm{o-o} \end{cases} \end{array} $ (8) The proton-neutron interactions play important roles in many nuclear properties and phenomena, such as the single particle structure, collectivity, configuration mixing, local mass formula, rotational bands, octupole deformations, deformations, and phase transformation.
The empirical proton-neutron interactions extracted from the RCHB mass table and the KRR predictions are shown in Fig. 6(a) and Fig. 6(b), respectively. Additionally, the results extracted from the KRRoe predictions are presented in Fig. 6(c) for comparison. As shown in Fig. 6(a), the empirical proton-neutron interactions from the RCHB mass table decrease gradually with the increasing mass number, varying from more than 1 MeV in the light mass region to less than 100 keV in the superheavy mass region. Another feature that can be observed is the sudden changes in
$\delta V_{{pn}}$ at nucleon magic numbers. These two behaviors are consistent with the presentation of$\delta V_{{pn}}$ extracted from the experimental mass data [98]. As shown in Fig. 6(b), the KRR predictions well reproduce the pattern of the empirical proton-neutron interactions in Fig. 6(a), both in magnitude and in variation trend. It is thus interesting to note that the KRR method has learned the main features of empirical proton-neutron interactions, although most of their values are less than the rms deviation 0.96 MeV for learning binding energies. One explanation to this could be the empirical proton-neutron interactions have almost canceled the contributions from pairing energies with the selection of the four neighboring nuclei and show smooth trends with the nucleon number, which can be captured by the KRR method with a Gaussian kernel. In Fig. 6(c), it is not surprising to see that the KRRoe method can also predict the empirical proton-neutron interactions and give a more detailed pattern similar to those of the RCHB mass table. -
The discussions presented above showed the learning results of the KRR method with optimized hyperparameters. Note that the learning ability of the KRR method may significantly depend on the hyperparameters. Therefore, it is worth investigating how the learning ability changes with different hyperparameters. In the following, comparisons between the optimized KRR results and the learning results obtained using two other sets of hyperparameters
$ (\sigma, \lambda)=(4.0, 1.0\times 10^{-3}) $ and$ (30.0, 1.0\times 10^{-3}) $ are presented. Both rms deviations with respect to the RCHB mass table are less than 2 MeV.For a close look, Fig. 7(a) and Fig. 7(b) show the odd-even mass differences in Ca and Sn isotopic chains obtained by the KRR predictions with the three different sets of hyperparameters, in comparison with those from the RCHB mass table, from the KRRoe predictions, and from the experimental data. It can be seen that both the RCHB theory and the KRRoe predictions reproduce the empirical odd-even mass differences very well. However, all three KRR predictions fail to reproduce the correct staggering in the odd-even mass differences, for both large and small values of the hyperparameter σ. A clear staggering opposite in phase is even given out by the KRR predictions with a small
$ \sigma=4.0 $ . This strange behavior can actually be understood. When the KRR predictions with a small σ are applied, the binding energy of the middle nucleus in Eq. (7) that should have been relatively large (small) owing to pairing correlations is contaminated heavily by correlating the small (large) binding energies of the two neighboring nuclei. If the hyperparameter σ increases, the effective correlation distance increases, resulting in the quenching of odd-even mass differences.Figure 7. (color online) Odd-even mass differences and empirical proton-neutron interactions of Ca and Sn isotopic chains in the RCHB mass table and the different KRR predictions, as functions of the neutron number. The KRR results include the optimized predictions (KRR), predictions with
$ \sigma=4.0 $ and$ \lambda=1.0\times 10^{-3} $ (KRR*), and predictions with$ \sigma=30.0 $ and$ \lambda=1.0\times 10^{-3} $ (KRR**). The corresponding experimental data [99] and the results given by the KRRoe method are shown for comparison.Figure 7(c) and 7(d) present the empirical proton-neutron interactions in Ca and Sn isotopic chains obtained by the KRR predictions with the three different sets of hyperparameters, in comparison with those from the RCHB mass table and the KRRoe predictions. As shown, upon the overall decrease with the increasing neutron number, the
$\delta V_{{pn}} $ values obtained by the RCHB mass table and the KRRoe predictions exhibit a rapid drop at shell closure$ N=28 $ in Ca and at$ N=82 $ in the Sb isotopic chain. The three KRR predictions can reproduce this overall decrease, but the one with a large hyperparameter$ \sigma=30.0 $ does not exhibit the rapid drop at shell closure. In comparison, the one with a small hyperparameter, i.e.,$ \sigma=4.0 $ , exhibits more fluctuations that appear random. Nevertheless, these different behaviors with different hyperparameters (σ) can be basically understood by considering σ as a measure of the effective correlation distance.
Examination of machine learning for assessing physical effects: Learning the relativistic continuum mass table with kernel ridge regression
- Received Date: 2023-02-13
- Available Online: 2023-07-15
Abstract: The kernel ridge regression (KRR) method and its extension with odd-even effects (KRRoe) are used to learn the nuclear mass table obtained by the relativistic continuum Hartree-Bogoliubov theory. With respect to the binding energies of 9035 nuclei, the KRR method achieves a root-mean-square deviation of 0.96 MeV, and the KRRoe method remarkably reduces the deviation to 0.17 MeV. By investigating the shell effects, one-nucleon and two-nucleon separation energies, odd-even mass differences, and empirical proton-neutron interactions extracted from the learned binding energies, the ability of the machine learning tool to grasp the known physics is discussed. It is found that the shell effects, evolutions of nucleon separation energies, and empirical proton-neutron interactions are well reproduced by both the KRR and KRRoe methods, although the odd-even mass differences can only be reproduced by the KRRoe method.