Two papers were accepted to the 2021 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2021):“On the Embeddings of Variables in Recurrent Neural Networks for Source Code” by Nadezhda Chirkova;“A Simple Approach for Handling Out-of-Vocabulary Identifiers in Deep Learning for Source Code” by Nadezhda Chirkova and Sergey Troshin.The final versions of the papers and the source code will be released soon. The research is conducted with the use of the computational resources of the HSE Supercomputer Modeling Unit.Both papers address the problem of improving the quality of deep learning models for source code by utilizing the specifics of variables and identifiers. The first paper proposes a recurrent architecture that explicitly models the semantic meaning of each variable in the program. The second paper proposes a simple method for preprocessing rarely used identifiers in the program so that a neural network (particularly, Transformer architecture) would better recognize the patterns in the program. The proposed methods were shown to significantly improve the quality of code completion and variable misuse detection.
The paper authored by laboratory's research assistants Dmitry Molchanov and Arsenii Ashukha and head Dmitry Vetrov has been accepted to the International Conference on Machine Learning'2017. In this research a state-of-the-art result in deep neural networks sparsification was achieved using Bayesian framework applied to deep learning.