• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

HSE, Yandex and CERN Researchers Work Together on Machine Learning in High Energy Physics

On August 30, 2015, the Summer School on Machine Learning in High Energy Physics wrapped up this year’s session. The school, which was held at the St. Petersburg Academic University, was organized by HSE in cooperation with the Yandex School of Data Analysis (SDA) and the Yandex Data Factory (YDF). This school is continuing cooperation between Yandex and CERN, which involves YDF and SDA researchers working together with experimental physicists on solving current problems in the field of physics. Many tasks require using machine learning approaches, which allow for greater accuracy and efficiency in these studies.


Andrey Ustyuzhanin,
Head of the Laboratory of Methods for Big Data Analysis,
Director of the Summer School on Machine Learning in High Energy Physics

All of the school’s participants (about 50 people) were divided into two tracks, introductory and advanced. The main focus of the former was to provide an introduction to the principles of machine learning algorithms (decision trees, linear models, and neural networks), model evaluation and the use of classification for physical hypotheses testing; participants in this track also discussed comparison and overfitting in multidimensional  distribution by means of machine learning. The advanced track focused on advanced algorithms (feature selection methods, ensemble methods, learning sample manipulations, genetic algorithms, hill climbing, rotation forest, dimensionality reduction, PCA, SVD, nonlinear methods, and deep learning approach) and on the application of algorithms in solving specific physical problems.

In addition to machine learning classes, the school included several overview lectures on various practical aspects of machine learning application in CERN experiments. Staff from the LHCb and CMS experiments spoke about optimization of online filtration of events through the use of machine learning, prediction of qualities in new particles, discovery of the Higgs boson and the search for nonstandard physical processes in experimental data. Online filtration of events in the early stages of events’ processing and reconstruction of the event structure using deep learning approaches in the LHCb experiment is a result of joint work carried out by CERN and HSE researchers.

Particular attention was paid to practical tasks. Seminars included a practical introduction to algorithms and tools that participants can use in further research. In addition to the seminars, a Kaggle competition was organized based on data from the COMET simulator experiment, which is being built in Japan. The aim of this experiment is to discover a brand new physical process that shows itself in neutrino-less conversion of a muon to an electron. Discovery of such conversion would change our knowledge of particle physics, since it contradicts the current standard model. A postdoctoral fellow at Imperial College London and a COMET participant spent two months this spring at an internship at SDA practicing the use of machine learning for searching particles (tracks) of a certain type (the form of tracks that allows the process that has taken place to be judged). As a result of this joint research, the efficiency of algorithms was increased from 83% to 99.9%. The competition was a perfect way to stimulate practical work – the participants were contending for first place on the last day until the final seconds.

Some statistics

Participants

 

Participant profile: physicists 65%, computer scientists 30%, other 5%


Participants' academic degree


8 (30.8%)

Undergraduate student

12 (46.2%)

PhD student

4 (15.4%)

PhD

2 (7.7%)

Other

The school materials are available in a public repository.

This is the first time the school has been held, but feedback from the participants sounds optimistic:

Would you recommend this school to your friends and colleagues? Answer on a scale from 1 (never) to 5 (yes, of course!):



An interesting result of the school is the solution of the competition problem. This problem was taken from a real track recognition problem in the COMET experiment. It was being solved by Ewen Gillies, a postdoctoral fellow at Imperial College London, during his internship at the HSE Laboratory of Methods for Big Data Analysis under supervision by Alex Rogozhnikov. We simplified the real problem for the school and provided several useful hints, but the results of the school participants’ results are comparable with the quality of practical results. Congratulations to Sergey Korolev and Dmitry Petrov who took first place in our competition!