DAML Week 9, CP17: Lepton energy reconstruction in water Cˇerenkov detectors: NN Regression and Gradient Boosted Regression Trees
1 Introduction
σ(p+ν ̄e →n+e+)≃5×10−44 * Christos.Leonidopoulos@ed.ac.uk
cm2
Christos Leonidopoulos* University of Edinburgh
March 14, 2021
In today’s lecture we are returning to regression problems. For the last CP of the course we will try to model the energy of charged leptons produced from neutrino interactions in water Cˇerenkov detectors. We have seen in the past how to use Neural Network (NN) regression in order to predict the behaviour of a continuous target variable. Today we will be introducing a new regression method, the Gradient Boosted Regression Trees (GBRT) [1]. Like most machine-learning applications, the choice of the optimal algorithm depends greatly on the problem at hand. The goal of today’s lecture is to familiarise ourselves with GBRT and carry out a performance comparison between the two regression methods in a problem involving lepton energy reconstruction in neutrino experiments.
2 Neutrino interactions in water
Neutrino interactions in the water can be described with the inverse beta-decay process, leading to the creation of charged leptons:
n+νe →p+e− p + ν ̄ e → n + e +
The cross section of these processes is typically very small, e.g.
Eν ̄ 2 MeV
1
In experiments studying atmospheric neutrinos, the existence of neutrinos can be in- ferred indirectly via the charged current interactions with nuclei, and the release of high- energetic charged leptons, e.g.
νe + N → e− + X
ν ̄ μ + N → μ + + X
which are then detected with Cˇerenkov light detectors1. Examples of neutrino detection with Cˇerenkov detectors in the Super-Kamiokande experiment can be seen in Fig. 1. Light from Cˇerenkov radiation is collected by photomultipliers (PMTs). The geometric distribution of the photon energy deposits is used to reconstruct the energy, position and direction of the charged leptons. The goal of this checkpoint is to reconstruct the lepton energy in a neutrino experiment with machine-learning based techniques. The methodology is described in Ref. [2].
3 The TITUS detector
The “Tokai Intermediate Tank for the Unoscillated Spectrum” (TITUS) [3] is a proposed intermediate detector for the Hyper-Kamiokande experiment [4]. TITUS has a total mass of 2.1 kton, and is horizontally placed along the neutrino beam to act as an intermedi- ate detector. It has the same off-axis angle and target material as the far detector at Hyper-Kamiokande, with the goal of reducing the systematics uncertainties. A schematic representation of the detector can be seen in Fig. 2.
In today’s checkpoint we will be using simulated interactions of muon neutrinos inside the TITUS detector. We will analyse the pattern of PMT hits, and the distance of the muon track from the walls of the detector in order to reconstruct the energy of the muon created in the neutrino-water interaction.
4 TITUS dataset
The information for about 160k simulated muon neutrino interactions inside the TITUS detector can be found at https://cernbox.cern.ch/index.php/s/vnhtLal8Z3MeVCN, tabulated in a 26 MB CSV file. Every entry in the spreadsheet corresponds to a separate neutrino interaction. The file contains a large number of variables per event. Here we will focus on a subset of them, summarised in Table 1.
A schematic diagram of the TITUS detector with the variables defined in Table 1 can be seen in Fig. 3. Notice that the input variables to be used in today’s analysis have already been normalised to give a range between 0 and 1. This means that we will not have to carry out the usual “input-feature standardisation” treatment.
1Cˇerenkov detectors are designed to detect the photon emission from particles traveling faster than the speed of light in a medium. An animated demonstration of the geometric principle can be found at http://upload.wikimedia.org/wikipedia/commons/8/87/Cherenkov_radiation-animation.gif
2
Figure 1: Top: The Super-Kamiokande detector consisted of 50,000 tons of (purified) water and more than 11,000 50-cm diameter photomultiplier tubes (detecting Cˇerenkov light). Bottom: Cˇerenkov rings for electron (left) and muon (right) candidates from the Super-Kamiokande experiment [5].
3
Figure 2: Schematic representation of the TITUS detector. Credit: Ref [2].
5 Data inspection and variable plotting
We will be using the scikit-learn [6] and keras [7] libraries for today’s regression problem.
We will start with the usual steps of opening the CSV file, inspecting the data in case it requires cleaning, and plotting a few distributions.
Exercise #1 (1 point):
Open the CSV file, and feed the data into a pandas.core.frame.DataFrame object.
Carry out the usual sanity checks (pandas.core.frame.DataFrame.head(5)), and de- termine if data-cleaning is necessary (pandas.core.frame.DataFrame.dropna(inplace = True)). Make sure all variables have the same number of entries before proceeding.
Plot 1D distributions for the all 6 variables listed in Table 1, in log scale (y-axis). Do not normalise the distributions. You can use a loop over the (5 input plus 1 output) features to simplify the code, and you can ignore the units on the x-axis.
4
Variable
Description
total hits2
Total number of PMT in-time-coincidence hits assigned to clusters, divided by 40,000
total ring PEs2
Number of PMT hits that have been assigned to Cˇerenkov rings, divided by 4,500
recoDWallR2
Minimum distance from the reconstructed vertex to the closest barrell wall (see Fig. 3), divided by 5.5 m
recoDWallZ2
Distance from the reconstructed vertex to the nearest detector endcap (see Fig. 3), divided by 12 m
lambda max 2
Muon track length estimated from the observed pattern of PMT hits (see Fig. 3), divided by 25 m
trueKE
True (MC-truth) muon energy (in MeV)
Table 1: List of subset of kinematic variables contained in CSV file to be used in this analysis, and brief description.
Figure 3: Overview of variables to be used in the reconstruction of the muon track energy. See Table 1 for details. Credit: Ref [3].
6 NN regression
We will start by doing a simple NN regression, as we have done in the past. We will use as input features the first 5 variables that appear in Table 1. The sixth variable (trueKE) is the target (output) feature, i.e. the continuous variable whose behaviour we are trying to predict given the input features.
5
Exercise #2 (4 points):
Create a new dataframe containing only the 5 input and 1 output features of interest. Split the new dataset into training (70%) and test (30%) subsets:
from tensorflow.sklearn import model_selection
Answer_to_all_questions = 42
# train -test split of dataset
train_data, test_data, train_target, test_target = model_selection.train_test_split(\
input_data, target, test_size=0.3, random_state=Answer_to_all_questions) print(train_data.shape, train_target.shape, test_data.shape, test_target.shape)
(a) (2 points) Write a function implementing a NN, as we have seen in previous check- points. Use a reasonable number of layers and nodes.
Implement the usual callback hook that exits the NN optimisation when the classification converges.
from tensorflow.keras.callbacks import EarlyStopping , ModelCheckpoint
callbacks_ = [
# if we don’t have an increase of the accuracy for 10 epochs, terminate training. EarlyStopping(verbose=True, patience=10, monitor=’loss’),
# Always make sure that we’re saving the model weights with the best accuracy. ModelCheckpoint(’model.h5’,monitor=’loss’, verbose=0, save_best_only=True, mode=’max’)]
Demonstrate the good performance by running a cross-validation. Don’t spend too much time optimising the architecture, aim for a R2 score of higher than 70-75%.
from tensorflow.keras.wrappers.scikit_learn import KerasRegressor from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
batchSize = ????
N_epochs = ????
Answer_to_all_questions = 42 np.random.seed(Answer_to_all_questions)
from sklearn.pipeline import Pipeline
estimators = []
estimators.append((’mlp’, KerasRegressor(build_fn=my_model , epochs=N_epochs ,
batch_size=batchSize, verbose=1))) pipeline = Pipeline(estimators)
kfold = KFold(n_splits=10, random_state=Answer_to_all_questions , shuffle=True) results = cross_val_score(pipeline, data, target, cv=kfold,
fit_params = {’mlp__callbacks’: callbacks_}, scoring=’r2’) print(“Result: %.2f %s %.2f” % (results.mean(), u”\u00B1″, results.std()))
(b) (2 points) Use method fit on the model you have implemented in the function, using the training dataset for the fitting, and the test dataset for validation: validation data = (test data, test target).
Use array History.history[’loss’] to plot the evolution of the loss-function (y-axis in log-scale) as a function of the epoch. You may use function nn utils.lossplot or write your own.
Use method predict to get an array of predicted muon-energy values using test data. Compare it with test target by creating a “predicted” (y-axis) vs.“actual” (x-axis) scatter plot of muon energy values.
Hint: See https://cernbox.cern.ch/index.php/s/O98CVtA9BCc7B5B for some handy utilities that you can use in today’s checkpoint.
6
7 Gradient Boosted Regression Trees
We will now switch from using a NN for regression to a new advanced method that uses decision trees, called GBRT. It may sound counter-intuitive that we can use decision trees for regression problems, i.e. while trying to model the behaviour of a continuous variable.
GBRT works by splitting up the dataset into smaller samples of similar “patterns” (i.e. effectively looking for small subsets with approximately constant values for the tar- get variable). We have seen that decision trees are created (by a regressor or classifier algorithm) by trying to guess the optimal number of leaves and tree depths. For every transient architecture, residuals (i.e. the differences between the prediction and the actual target values) are calculated. The gradient boosting algorithm then tries to adjust the patterns in order to reduce these residuals. This is repeated until the residuals no longer follow a pattern (i.e. they are randomly distributed around the predicted values, but not with an overall systematic shift). In terms of the implementation, this is achieved in the same way we have seen before, with the minimisation of a loss function until it reaches a minimum.
See Ref. [1] for a nice illustration of the iterative process that reduces the residuals during the optimisation.
Exercise #3 (5 points):
(a) (1 point) Try to get a quick result by using a GBRT with the default parameters.
from sklearn.ensemble import GradientBoostingRegressor from sklearn.model_selection import GridSearchCV
gbr0 = GradientBoostingRegressor(n_estimators=100) gbr0.fit(train_data , train_target.ravel())
Use method GradientBoostingRegressor.score to get two R2 scores, one for the train- ing and one for the test datasets. If these two numbers differ significantly, you are prob- ably suffering from overfitting.
How do these scores compare with the performance you got for the NN regressor?
(b) (2 points) We will be now doing a proper optimisation by using a grid-search of the regressor parameters and cross-validation in order to identify the optimal regressor. Use the syntax below to feed the regressor with sets of various parameter values.
NB: this can result into a slow optimisation (e.g. up to 1-2 hours, or more). Play with a small number of values for the parameter grid first, then slowly work your way towards an optimal point. Make sure the machine you are running on can handle n jobs > 1 before starting the optimisation.
7
Exercise #3 (5 points), continued:
param_grid_ = { ’n_estimators’:[100], ’learning_rate’: [0.1, 0.05], ’max_depth’:[5, 10], ’min_samples_leaf’:[50,100],
}
njobs_=8 # jobs to run in parallel
np.random.seed(Answer_to_all_questions)
gbr = GradientBoostingRegressor()
classifier = GridSearchCV(estimator=gbr, cv=kfold, param_grid=param_grid_,
n_jobs=njobs_, verbose=1) classifier.fit(train_data , train_target.ravel()) print “Best estimator:”
print classifier.best_estimator_
Use function nn utils.plot learning curve [8] to plot the performance of the optimised regressor (classifier.best estimator ) for the training and the test datasets.
(c) (2 points) Create a new GBRT regressor with the parameters as determined by classifier.best estimator . Run method cross val score on the test dataset and cross-validation, and determine the R2 score with its standard deviation.
Use method fit on the training dataset, and predict on the test dataset. Get the array of predicted muon-energy values using test data. Compare it with test target by creating a “predicted” (y-axis) vs. “actual” (x-axis) scatter plot of muon energy values.
Use method feature importances on the optimised regressor to get an insight as to which input features have the greatest impact on the predictive power of the al- gorithm. Order the input features according to their weight. Create a bar chart (matplotlib.pyplot.bar) with the ranked weights of the 5 input features.
8 Summary
In today’s lecture we have seen a new type of machine learning algorithm, the Gradient Boosted Regression Trees, which we use to estimate the muon energy in neutrino inter- actions in water Cˇerenkov detectors. In several cases the GBRT can outperform the NN regressors. As it is often the case, one needs to carry out a comparison with the particular problem and dataset in order to identify the most performant algorithm.
8
References
[1] For a brief introduction in gradient boosting algorithms see e.g. the discussion at https://medium.com/mlreview/gradient-boosting-from-scratch-1e317ae4587d
[2] “Application of machine learning techniques to lepton energy reconstruction in wa- ter Cˇerenkov detectors”, E. Drakopoulou et al., 2018 JINST 13 P04009, DOI: 10.1088/1748-0221/13/04/P04009, arXiv:1710.05668
[3] “TITUS: the Tokai Intermediate Tank for the Unoscillated Spectrum”, C. Andreopou- los et al., arXiv:1606.08114
[4] http://www.hyperk.org/
[5] http://www-sk.icrr.u-tokyo.ac.jp/sk/index-e.html
[6] “Scikit-learn: Machine Learning in Python”, F. Pedregosa et al., JMLR 12, pp. 2825- 2830, 2011.
[7] Keras, Fran ̧cois Chollet et al., https://keras.io
[8] The plotting function was retrieved from the scikit-learn page (March 2019): https://scikit-learn.org/stable/auto_examples/model_selection/plot_learning_curve. html.
9