Analysis Higgs Boson using Neural Designer

The data used for this example are simulated data provided by the ATLAS experiment at CERN. Physicists use them to optimize the analysis of the Higgs Boson.

In the early 1960s, particle physics need a theory able to explain the origin of the mass in the universe. Parallel that, in 1964 Peter Higgs theorized the Higgs boson as the elemental particle which is responsible for the mass of other elementary particles, i.e., is a particle able to generate the materia as we can know.

In 2012, it has been disvovered by the ATLAS experiment and the CMS experiment, located at the Large Hadron Collider (LHC) at CERN. In these experiments, brunches of protons are accelerated on a circular trajectory in both directions. When the brucnhes of protons cross the ATLAS detector, some of the protons collide producing proton-proton collision called event.

Signal process involving a new exotic Higgs boson.

Two colliding protons prudce a hundreds of new particles, which are detected by the ATLAS detector. From the information of this detector, we can obtain the type, energy and 3D direction of every new particles.

Higgs logo

Contents:

  1. Application type.
  2. Data set.
  3. Neural network.
  4. Training strategy.
  5. Model selection.
  6. Testing analysis.
  7. Model deployment.

This example is solved with Neural Designer. To follow it step by step, you can use the free trial.

1. Application type

This is a classification project, since the variable to be predicted is binary: Higgs boson or not.

In the LHC produces over ten millions of collision per hour, where aprroximately 300 results in a Higgs boson. These events are saved to disk, producing about one billion events and three petabytes of raw data per year, to be analyzed and obtain evidences of Higgs boson.

Among the events, we can distinguish between the events which produce particles of interest (Higgs boson) is called signal and the uninteresting events background, which are exotic in everyday terms but have been discovered by previous generations of experiments. Then, the problem that now arises is how to distinguish between these events with large amounts of data.

Hence, the scientists worked together to provide solutions to this problem. In the left hand, the use of numerical methods capable of classifying these events, such as boosted decision tree. On the other hand, performing numerical simulations capable of reproducing the signal event where the Higgs boson appears, these simulations provide numerical values of the variables that can be detected by the ATLAS detector.

Inverse problem

Now, the problem is that we have a lot of data from the possible favourable cases, but how can we use it to classify the raw data obtained in the detector?. The formulation is the inverse problem, where we know the solution and from it we can obtain the variables of the problem as can be seen in the figure above.

With the simulation data, we can create a new model able to classify beetwen both events. In this way, to create the model able to classify this events is neccesary use a machine learning techniques to obtain a good classification rate in order to detected the Higgs boson.

2. Data set

The first step is to prepare the data set, which is the source of information for the classification problem. For that, we need to configure the following concepts:

The data source is the file Higgs.csv. It contains the data for this example in comma-separated values (CSV) format. The number of columns is 28, and the number of rows is 10012.

The database consists of 11 millions of simulated events using the official ATLAS full detector simulator (we have reduced the data set to around 10000). Firstly, proton-proton collisions are simulated based on all the knowledge that has been accumulated on particle physics. Secondly, the resulting particles are tracked through a virtual model of the detector.

Each event is described by 27 different features such as the estimated mass of the Higgs Boson (125 GeV) candidate or the missing transverse energy. They will not be used as an input for the analysis.

The next table shows the variables of the data set:


Variable   

Description   

lepton pT (GeV)   
The transverse momentum of the lepton, where they can be electrons or tauons.

lepton eta (GeV)   
The pseudorapidity eta of the lepton.

lepton phi (GeV)   
The azimuth angle phi of the lepton.

Missing energy magnitude (GeV)   
Energy that is not detected in a particle detector.

Missing energy phi (GeV)   
Energy that is not detected in a particle detector.

jet 1 pT (GeV)   
The transverse momentum of the first jet group.

jet 1 eta (GeV)   
The pseudorapidity eta of the first jet group.

jet 1 phi (GeV)   
The azimuth angle phi of the first jet group.

jet 1 b-tag (GeV)   
jet consistent with b-quarks.

jet 2 pT (GeV)   
The transverse momentum of the second jet group.

jet 2 eta (GeV)   
The pseudorapidity eta of the second jet group.

jet 2 phi (GeV)   
The azimuth angle phi of the second jet group.

jet 2 b-tag (GeV)   
Jet consistent with b-quarks.

jet 3 pT (GeV)   
The transverse momentum of the third jet group.

jet 3 eta (GeV)   
The pseudorapidity eta of the third jet group.

jet 3 phi (GeV)   
The azimuth angle phi of the third jet group.

jet 3 b-tag (GeV)   
Jet consistent with b-quarks.

jet 4 pT (GeV)   
The transverse momentum of the fourth jet group.

jet 4 eta (GeV)   
The pseudorapidity eta of the fourth jet group.

jet 4 phi (GeV)   
The azimuth angle phi of the fourth jet group.

jet 4 b-tag (GeV)   
Jet consistent with b-quarks.

M_jj (GeV)   
The transverse momentum of the fourth jet group.

M_jjj (GeV)   
The pseudorapidity eta of the fourth jet group.

M_lv (GeV)   
The pseudorapidity eta of the fourth jet group.

M_jlv (GeV)   
The pseudorapidity eta of the fourth jet group.

M_bb (GeV)   
The pseudorapidity eta of the fourth jet group.

M_wbb (GeV)   
The pseudorapidity eta of the fourth jet group.

M_wwbb (GeV)   
The pseudorapidity eta of the fourth jet group.

Event   
Signal or Background event. Binary variable.
In this table we have described the variables of the problem for a better understanding of it.

The instances are divided into training, selection, and testing subsets. They represent 60% (6008), 20% (2002), and 20% (2002) of the original instances, respectively, and are split at random.

We can calculate the distributions of all variables. The next figure is the pie chart for the Higgs boson ot background cases.

As we can see, the most os the samples are Higgs boson signal. In the real data, the background represents the majority of the events.

3. Neural network

The second step is to choose a neural network. For classification problems, it is usually composed by:

The scaling layer contains the statistics on the inputs calculated from the data file and the method for scaling the input variables. Here the minimum and maximum method has been set. Nevertheless, the mean and standard deviation method would produce very similar results.

The number of perceptron layers is 1. This perceptron layer has 28 inputs and 4 neurons.

Finally, we will set the binary probabilistic method for the probabilistic layer as we want the predicted target variable to be binary.

The next figure is a graphical representation of this classification neural network:

Here, the yellow circles represent scaling neurons, the blue circles perceptron neurons and the red circles probabilistic neurons. The number of inputs is 10, and the number of outputs is 1.

4. Training strategy

The fourth step is to set the training strategy, which is composed of:

The loss index chosen for this application is the weighted squared error with L2 regularization.

The error term fits the neural network to the training instances of the data set. The regularization term makes the model more stable and improves generalization.

The optimization algorithm searches for the neural network parameters which minimize the loss index. The quasi-Newton method is chosen here.

The following chart shows how the training and selection errors decrease with the epochs during the training process.

The final values are training error = 0.837 NSE (blue), and selection error = 0.880 NSE (orange).

5. Model selection

The objective of model selection is to find the network architecture with the best generalization properties, that is, that which minimizes the error on the selected instances of the data set.

More specifically, we want to find a neural network with a selection error of less than 0.880 WSE, which is the value that we have achieved so far.

Order selection algorithms train several network architectures with a different number of neurons and select that with the smallest selection error.

The incremental order method starts with a small number of neurons and increases the complexity at each iteration.

6. Testing analysis

The last step is to test the generalization performance of the trained neural network.

The objective of the testing analysis is to validate the generalization performance of the trained neural network. To validate a classification technique, we need to compare the values provided by this technique to the observed values. We can use the ROC curve as it is the standard testing method for binary classification projects.

In this case, the area under the ROC curve is 0.720.

In the confusion matrix, the rows represent the targets (or real values) and the columns the corresponding outputs (or predictive values). The diagonal cells show the cases that are correctly classified, and the off-diagonal cells show the misclassified cases.

Predicted positive (Higgs Boson) Predicted negative (background)
Real positive (Higgs Boson) 753 (37.6%) 307 (15.3%)
Real negative (background) 355 (17.7%) 587 (29.3%)

The number of instances that the model can correctly predict is 1340 (66.9%) while it misclassifies is 662 (33.1%).

The binary classification tests shown in the next picture provide us with some useful information about the performance of the model:


Test   

Description   

Value   
Classification accuracy Ratio of instances correctly classified. 67%
Error rate Ratio of instances misclassified. 33%
Sensitivity Portion of real positive which are predicted positive. 71%
Specificity Portion of real negative predicted negative. 62%


As we can see, the classification accuracy, which is the proportion of instances that the model can correctly classify, is 0.669 (66.9%). The error rate, which is the ratio of instances misclassified, is 0.331 (33.1%).

7. Model deployment

Once the model has been tested, Neural Designer allows us to obtain the mathematical expression of the trained deep architecture with which more than four million events per second can be analyzed.

The mathematical expression of the trained neural network is listed below.

scaled_lepton_pT = lepton_pT*(1+1)/(6.699999809-(0.275000006))-0.275000006*(1+1)/(6.699999809-0.275000006)-1;
scaled_lepton_eta = lepton_eta*(1+1)/(2.430000067-(-2.430000067))+2.430000067*(1+1)/(2.430000067+2.430000067)-1;
scaled_lepton_phi = lepton_phi*(1+1)/(1.74000001-(-1.74000001))+1.74000001*(1+1)/(1.74000001+1.74000001)-1;
scaled_missing_energy_magnitude = missing_energy_magnitude*(1+1)/(5.820000172-(0.01240000036))-0.01240000036*(1+1)/(5.820000172-0.01240000036)-1;
scaled_missing_energy_phi = missing_energy_phi*(1+1)/(1.74000001-(-1.74000001))+1.74000001*(1+1)/(1.74000001+1.74000001)-1;
scaled_jet_1_pT = jet_1_pT*(1+1)/(7.059999943-(0.1589999944))-0.1589999944*(1+1)/(7.059999943-0.1589999944)-1;
scaled_jet_1_eta = jet_1_eta*(1+1)/(2.970000029-(-2.940000057))+2.940000057*(1+1)/(2.970000029+2.940000057)-1;
scaled_jet_1_phi = jet_1_phi*(1+1)/(1.74000001-(-1.74000001))+1.74000001*(1+1)/(1.74000001+1.74000001)-1;
scaled_jet_1_b_tag = jet_1_b_tag*(1+1)/(2.170000076-(0))-0*(1+1)/(2.170000076-0)-1;
scaled_jet_2_pT_1 = jet_2_pT_1*(1+1)/(5.190000057-(0.1899999976))-0.1899999976*(1+1)/(5.190000057-0.1899999976)-1;
scaled_jet_2_eta = jet_2_eta*(1+1)/(2.910000086-(-2.910000086))+2.910000086*(1+1)/(2.910000086+2.910000086)-1;
scaled_jet_2_phi = jet_2_phi*(1+1)/(1.74000001-(-1.74000001))+1.74000001*(1+1)/(1.74000001+1.74000001)-1;
scaled_jet_2 _b_tag = jet_2 _b_tag*(1+1)/(2.210000038-(0))-0*(1+1)/(2.210000038-0)-1;
scaled_jet_2_pT_2 = jet_2_pT_2*(1+1)/(6.519999981-(0.2639999986))-0.2639999986*(1+1)/(6.519999981-0.2639999986)-1;
scaled_jet_3_eta = jet_3_eta*(1+1)/(2.730000019-(-2.730000019))+2.730000019*(1+1)/(2.730000019+2.730000019)-1;
scaled_jet_3_phi = jet_3_phi*(1+1)/(1.74000001-(-1.74000001))+1.74000001*(1+1)/(1.74000001+1.74000001)-1;
scaled_jet_3_b_tag = jet_3_b_tag*(1+1)/(2.549999952-(0))-0*(1+1)/(2.549999952-0)-1;
scaled_jet_4_pT = jet_4_pT*(1+1)/(6.070000172-(0.3650000095))-0.3650000095*(1+1)/(6.070000172-0.3650000095)-1;
scaled_jet_4_eta = jet_4_eta*(1+1)/(2.5-(-2.5))+2.5*(1+1)/(2.5+2.5)-1;
scaled_jet_phi = jet_phi*(1+1)/(1.74000001-(-1.74000001))+1.74000001*(1+1)/(1.74000001+1.74000001)-1;
scaled_jet_4_b_tag = jet_4_b_tag*(1+1)/(3.099999905-(0))-0*(1+1)/(3.099999905-0)-1;
scaled_M_jj = M_jj*(1+1)/(13.10000038-(0.1720000058))-0.1720000058*(1+1)/(13.10000038-0.1720000058)-1;
scaled_M_jjj = M_jjj*(1+1)/(7.389999866-(0.3420000076))-0.3420000076*(1+1)/(7.389999866-0.3420000076)-1;
scaled_M_lv = M_lv*(1+1)/(3.680000067-(0.4609999955))-0.4609999955*(1+1)/(3.680000067-0.4609999955)-1;
scaled_M_jlv = M_jlv*(1+1)/(6.579999924-(0.3840000033))-0.3840000033*(1+1)/(6.579999924-0.3840000033)-1;
scaled_M_bb = M_bb*(1+1)/(8.260000229-(0.08100000024))-0.08100000024*(1+1)/(8.260000229-0.08100000024)-1;
scaled_M_wbb = M_wbb*(1+1)/(4.75-(0.3889999986))-0.3889999986*(1+1)/(4.75-0.3889999986)-1;
scaled_M_wwbb = M_wwbb*(1+1)/(4.320000172-(0.4449999928))-0.4449999928*(1+1)/(4.320000172-0.4449999928)-1;

perceptron_layer_0_output_0 = tanh[ -1.37262 + (scaled_lepton_pT*-0.258746)+ (scaled_lepton_eta*0.0362805)+ (scaled_lepton_phi*-0.0832154)+ (scaled_missing_energy_magnitude*0.0609962)+ (scaled_missing_energy_phi*0.0118553)+ (scaled_jet_1_pT*-0.273796)+ (scaled_jet_1_eta*0.0535145)+ (scaled_jet_1_phi*0.0226441)+ (scaled_jet_1_b_tag*-0.0632788)+ (scaled_jet_2_pT_1*-0.112196)+ (scaled_jet_2_eta*0.0415722)+ (scaled_jet_2_phi*0.112501)+ (scaled_jet_2 _b_tag*-0.0129353)+ (scaled_jet_2_pT_2*-0.358131)+ (scaled_jet_3_eta*0.0191431)+ (scaled_jet_3_phi*-0.064177)+ (scaled_jet_3_b_tag*0.0189284)+ (scaled_jet_4_pT*0.157812)+ (scaled_jet_4_eta*-0.113475)+ (scaled_jet_phi*0.113226)+ (scaled_jet_4_b_tag*-0.0288421)+ (scaled_M_jj*0.928491)+ (scaled_M_jjj*1.56142)+ (scaled_M_lv*0.168347)+ (scaled_M_jlv*0.280415)+ (scaled_M_bb*-2.39201)+ (scaled_M_wbb*-1.74089)+ (scaled_M_wwbb*-1.1966) ];
perceptron_layer_0_output_1 = tanh[ 0.0302904 + (scaled_lepton_pT*0.931957)+ (scaled_lepton_eta*-0.107117)+ (scaled_lepton_phi*0.123916)+ (scaled_missing_energy_magnitude*0.829294)+ (scaled_missing_energy_phi*0.16022)+ (scaled_jet_1_pT*0.136368)+ (scaled_jet_1_eta*0.660784)+ (scaled_jet_1_phi*0.185285)+ (scaled_jet_1_b_tag*0.397167)+ (scaled_jet_2_pT_1*-0.412377)+ (scaled_jet_2_eta*-0.263695)+ (scaled_jet_2_phi*-0.0766151)+ (scaled_jet_2 _b_tag*-0.0693654)+ (scaled_jet_2_pT_2*-0.603738)+ (scaled_jet_3_eta*-0.363455)+ (scaled_jet_3_phi*-0.223806)+ (scaled_jet_3_b_tag*0.0319366)+ (scaled_jet_4_pT*-0.696683)+ (scaled_jet_4_eta*-0.35523)+ (scaled_jet_phi*-0.0530527)+ (scaled_jet_4_b_tag*-0.0545712)+ (scaled_M_jj*0.0526469)+ (scaled_M_jjj*-0.267477)+ (scaled_M_lv*-0.355338)+ (scaled_M_jlv*-0.269982)+ (scaled_M_bb*0.0216565)+ (scaled_M_wbb*-0.837428)+ (scaled_M_wwbb*1.08876) ];
perceptron_layer_0_output_2 = tanh[ -0.63821 + (scaled_lepton_pT*0.812243)+ (scaled_lepton_eta*0.0253617)+ (scaled_lepton_phi*-0.104747)+ (scaled_missing_energy_magnitude*0.182205)+ (scaled_missing_energy_phi*-0.325033)+ (scaled_jet_1_pT*-0.82733)+ (scaled_jet_1_eta*-0.802168)+ (scaled_jet_1_phi*-0.314119)+ (scaled_jet_1_b_tag*-0.107867)+ (scaled_jet_2_pT_1*-0.982512)+ (scaled_jet_2_eta*0.217818)+ (scaled_jet_2_phi*0.161883)+ (scaled_jet_2 _b_tag*0.0279769)+ (scaled_jet_2_pT_2*-0.283825)+ (scaled_jet_3_eta*0.186862)+ (scaled_jet_3_phi*0.0681052)+ (scaled_jet_3_b_tag*-0.090589)+ (scaled_jet_4_pT*0.167979)+ (scaled_jet_4_eta*0.00966222)+ (scaled_jet_phi*-7.93182e-05)+ (scaled_jet_4_b_tag*-0.0425335)+ (scaled_M_jj*0.232803)+ (scaled_M_jjj*0.123681)+ (scaled_M_lv*-0.129535)+ (scaled_M_jlv*0.4562)+ (scaled_M_bb*-0.144573)+ (scaled_M_wbb*-1.02725)+ (scaled_M_wwbb*0.318511) ];
perceptron_layer_0_output_3 = tanh[ 0.448874 + (scaled_lepton_pT*1.25716)+ (scaled_lepton_eta*-0.112071)+ (scaled_lepton_phi*0.0693822)+ (scaled_missing_energy_magnitude*0.200465)+ (scaled_missing_energy_phi*-0.179459)+ (scaled_jet_1_pT*0.983861)+ (scaled_jet_1_eta*-0.259419)+ (scaled_jet_1_phi*-0.139325)+ (scaled_jet_1_b_tag*0.224549)+ (scaled_jet_2_pT_1*-0.820891)+ (scaled_jet_2_eta*0.0750885)+ (scaled_jet_2_phi*0.0114138)+ (scaled_jet_2 _b_tag*-0.0692787)+ (scaled_jet_2_pT_2*-0.105241)+ (scaled_jet_3_eta*-0.12762)+ (scaled_jet_3_phi*-0.0696983)+ (scaled_jet_3_b_tag*-0.139342)+ (scaled_jet_4_pT*-0.3681)+ (scaled_jet_4_eta*-0.140406)+ (scaled_jet_phi*-0.135149)+ (scaled_jet_4_b_tag*-0.109267)+ (scaled_M_jj*-0.867477)+ (scaled_M_jjj*-0.309968)+ (scaled_M_lv*-0.0646885)+ (scaled_M_jlv*0.667318)+ (scaled_M_bb*-0.464598)+ (scaled_M_wbb*1.94239)+ (scaled_M_wwbb*-0.978041) ];

	probabilistic_layer_combinations_0 = -0.685385 +2.9533*perceptron_layer_0_output_0 -2.104*perceptron_layer_0_output_1 -2.1637*perceptron_layer_0_output_2 +2.63699*perceptron_layer_0_output_3 
	
Event = 1.0/(1.0 + exp(-probabilistic_layer_combinations_0);
        

As we already mentioned, this application has been solved with the professional predictive analytics solution Neural Designer. To find out more about Neural Designer click here.

References


Related examples: