-
The principle of Bayes' rule is to establish a posterior distribution from all unknown parameters trained by a given data sample [58]. A detailed description of the origins and development of BNNs goes beyond the scope of this study; hence, we limit ourselves to highlighting the main features of the approach.
The BNN approach to statistical inference is based on Bayes's theorem, which provides a connection between a given hypothesis (in terms of problem-specific beliefs for a set of parameters ω) and a set of data
$ (x,t) $ to a posterior probability$ p(\omega\mid x,t) $ that is used to make predictions on new inputs. In this context, Bayes's theorem may be written as [59]$ \begin{equation} p(\omega\mid x,t)=\frac{p(x,t\mid\omega) p(\omega)}{p(x,t)}, \end{equation} $
(1) where p(ω) is the prior distribution of the model parameters ω, and
$ p(x,t\mid \omega) $ is the “likelihood” that a given model ω describes the new evidence$ t(x) $ . The product of the prior and likelihood form the posterior distribution$ p(\omega \mid x,t) $ , which encodes the probability that a given model describes the data$ t(x) $ .$ p(x,t) $ is a normalization factor, which ensures that the integral of the posterior distribution is 1. In essence, the posterior represents the improvement to$ p(\omega) $ as a result of the new evidence$ p(x,t\mid \omega) $ .In this study, the inputs of the network are given by
$ x_{i}= \{Z_{i},N_{i},A_{i},E_{i}\} $ , which includes$ Z_{i} $ (the number of charges of the fission nucleus),$ N_{i} $ (the number of neutrons in the fission nucleus),$ A_{i} $ (the mass number of fission fragments), and$ E_{i}=e_{i}+S_{i} $ (neutron incident energy plus neutron separation energy).The neural network function
$ f(x,\omega) $ adopted here has the following “sigmoid” form [60−62]:$ \begin{equation} f(x,\omega)=a+\sum\limits_{j=1}^{H} b_{j} \tanh \left(c_{j}+\sum\limits_{i=1}^{l} d_{i j} x_{i}\right), \end{equation} $
(2) where the model parameters are collectively given by
$ \omega=\{a,b_{j},c_{j},d_{ij}\} $ . H is the number of neurons in the hidden layer, I is the number of inputs, a is the bias of the output layers,$ b_{j } $ is the weights of output layers,$ c_{j} $ is the bias of the hidden layers, and$ d_{j i} $ is the weights of the hidden layers. The tanh is a common form of the sigmoid activation function that controls the firing of artificial neurons. A schematic diagram of a neural network with a single hidden layer, three hidden neurons$(H=3) $ , and two input variables$(I=2) $ is shown in Fig. 1.Figure 1. (color online) Schematic diagram of a neural network with a single hidden layer, three hidden neurons (H = 3), and two input variables (I = 2).
To determine the optimal number of hidden layers and neurons in a neural network, we employ the technique of cross-validation. First, we estimate a rough range for the model based on prior research papers. Subsequently, we randomly partition our dataset into two parts, one for training and the other for testing. We repeat this process, swapping the roles of the training and testing sets each time. We then compute the average prediction errors obtained from training the model twice with each set to estimate the performance of our BNN model. Based on our analysis, we determine that the optimal configuration has double hidden layers, each consisting of 16 neurons.
Bayesian inference for neural networks calculates the posterior distribution of the weights given the training data,
$ p(\omega\mid x,t) $ . This distribution answers predictive queries about unseen data using expectations. Each possible configuration of the weights, weighted according to the posterior distribution, makes a prediction about the unknown label given the test data item x [63].In this study, we adopt a variational approximation to the Bayesian posterior distribution on the weights. To obtain an estimate of the maximum a posteriori of ω, we use variational learning to find a probability
$ {q}\left(\omega \mid \theta_{\omega }\right) $ to approximate the posterior probability of ω. To do this, we use Kullback-Leibler divergence to minimize the distance between the probabilities$ {q}\left(\omega \mid \theta_{\omega }\right) $ and$ {p}(\omega \mid \mathrm{\mathit{x,t} }) $ [64−67],$ \begin{aligned} \theta^{*} &=\arg \min _{\theta_{\omega }} \mathrm{KL}\left[ {q}\left(\omega \mid \theta_{\omega}\right) \| {p}(\omega \mid {\mathit{x,t} })\right] \\ &=\arg \min _{\theta_{\omega}} \int {q}\left(\omega \mid \theta_{\omega}\right) \ln \frac{ {q}\left(\omega \mid \theta_{\omega}\right)}{ {p}( {\mathit{x,t} } \mid \omega) {p}(\omega)} \mathrm{d} \omega \\ &=\arg \min _{\theta_{\omega}} \mathrm{KL}\left[ {q}\left(\omega \mid \theta_{\omega}\right) \| {p}(\omega)\right]- {E}_{ {q}\left(\omega \mid \theta_{\omega}\right)}[\ln {p}( {\mathit{x,t} } \mid \omega)]. \end{aligned} $
Then, the loss function of the BNN can be written in the following form:
$ \mathrm{F}\left( {\mathit{x,t} }, \theta_{\omega }\right)=\mathrm{KL}\left[ {q}\left(\omega \mid \theta_{\omega }\right) \| {p}(\omega )\right]- {E}_{ {q}\left(\omega \mid \theta_{\omega }\right)}[\ln {p}( {\mathit{x,t} } \mid \omega )]. $
Finally, using the Monte Carlo method, we can obtain an approximate result.
Bayesian evaluation of energy dependent neutron induced fission yields
- Received Date: 2023-06-16
- Available Online: 2023-12-15
Abstract: From both the fundamental and applied perspectives, fragment mass distributions are important observables of fission. We apply the Bayesian neural network (BNN) approach to learn the existing neutron induced fission yields and predict unknowns with uncertainty quantification. Comparing the predicted results with experimental data, the BNN evaluation results are found to be satisfactory for the distribution positions and energy dependencies of fission yields. Predictions are made for the fragment mass distributions of several actinides, which may be useful for future experiments.