First semester 2019/2020

19/12/2019 Artificial intelligence in biomedicine: how to get a decent salary, publish in Nature magazines and stay ahead of the West without leaving Russia

Alex Zhavoronkov, CEO of Insilico Medicine, bioinformatist and machine learning specialist

Alex Zhavoronkov headed the East European office of ATI Technologies from 2002 to 2006, then became interested in biomedicine, received a master's degree in bioinformatics from Johns Hopkins University (2008) and defended his Ph.D. thesis at the physics department of Moscow State University (2011). Since 2012, he has led several projects in the field of bioinformatics and digital medicine.

The most famous project is Insilico, founded in 2014. The company specializes in the application of Artificial Intelligence to solve biomedical problems related to the search for new biomarkers and targets for diseases. Insilico is a pioneer in the field of generative chemistry (the application of the generative architecture of neural network algorithms to create molecular structures with specified properties), the team has more than 70 publications over the 5 years of the company's existence in high-rated peer-reviewed journals:

In silico Pathway Activation Network Decomposition Analysis (iPANDA) as a method for biomarker development (Nature Communications, 2016)
Bifunctional immune checkpoint-targeted antibody-ligand traps that simultaneously disable TGFβ enhance the efficacy of cancer immunotherapy (Nature Communications, 2018)
Deep learning enables rapid identification of potent DDR1 kinase inhibitors (Nature Biotechnology, 2019)

Pokrovsky Boulevard, 11, M202

18:10-19:30

11/12/2019 Machine vision in the analysis of histological images

Ilya Galkin, bioinformatics analyst at BostonGene

Technologies for searching and classifying objects in images and videos are developing very rapidly in our time. From smartphone apps to self-driving cars, they are increasingly being used in various walks of life. Employees of the BostonGene company will talk about how to use machine learning algorithms for cell recognition and what prospects this opens up in the diagnosis and treatment of cancer.

Pokrovsky Boulevard, 11, G110

18:10-19:30

05/12/2019 Analysis of the relationship between the secondary organization of DNA, epigenetic markers and structural mutations in oncology

Anna Kurilovich, Research Lab Assistant

Changes in the expression of oncogenes and oncosuppressor genes, leading to the development and progression of oncological pathologies, occur as a result of mutational events. The report is devoted to the analysis of the spatial correlation between structural somatic mutations in patients with oncological pathologies in nine tissues and the spatial organization of the genome. We tried two approaches to the formulated problem. The first approach involves the search for regions where DNA strand breaks take place in a number of patients with cancer of various tissues (accumulations of mutations). The spatial environment of this kind of areas is characterized. The second approach is to search for spatial correlations between structural mutations in all patients with a common diagnosis and spatial factors - secondary DNA structures and epigenetic markers, using machine learning methods.

Pokrovsky Boulevard, 11, M202

18:10-19:30

03/12/2019 Geometric deep learning for functional protein design

Michael Bronstein, Professor, Chair of Machine Learning and Pattern Recognition, Imperial College London / Head of Graph Learning Research, Twitter

Protein-based drugs are becoming some of the most important drugs of the XXI century. The typical mechanism of action of these drugs is a strong protein-protein interaction (PPI) between surfaces with complementary geometry and chemistry. Over the past three decades, large amounts of structural data on PPIs has been collected, creating opportunities for differentiable learning on the surface geometry and chemical properties of natural PPIs. Since the surface of these proteins has a non-Euclidean structure, it is a natural fit for geometric deep learning, a novel class of machine learning techniques generalizing successful neural architectures to manifolds and graphs. In the talk, I will show how geometric deep learning methods can be used to address various problems in functional protein design such as interface site prediction, pocket classification, and search for surface motifs. I will present results of our ongoing work with Bruno Correia, Pablo Gainza-Cirauqui, and others from the EPFL Lab of Protein Design and Immunoengineering.

Location: room 319, Bolshoy Tryokhsvyatitelsky Pereulok, 3 (Kitai-Gorod Station)

Seminar working language – English

Video recording of the seminar

21/11/2019 Autoregressive generative models for the generation problem

Seminar following the internship at Harvard University, USA

Irina Ponamareva, 4th year FCS

The approaches that exist for solving the problem of generating protein sequences include models based on sequence alignments and models that do not use alignment. The main idea of the former is that sets of aligned sequences are fed to such models, and the model is trained to predict symbols for each position. Such models (for example, HMM) have a significant drawback: the set of aligned sequences required for training is not always possible to obtain, since the sequences under study can be highly variable both in amino acid composition and in length. For these sequences, alignment-based models do not work, but non-alignment models can be useful. Models that do not use alignment include models that use hidden sequence representation (autoencoders) and autoregressive models. Irina will talk about how during her internship she tried to adapt one of the popular XLNet autoregressive architectures for this task, and also talk about other similar approaches.

Pokrovsky Boulevard, 11, G120

18:10-19:30

12/11/2019 Modeling the pharmacogenetics of rivaroxaban using machine learning methods

Alex Shein, Lab Research Assistant

Models of classification of patient outcomes by genetic and clinical diagnostic parameters will be considered: logistic regression, support vector machine and random forest. The analysis of the importance of features is discussed. We will also consider machine learning models for predicting the concentration of rivaroxaban: a linear model with l1 regularization, support vector machines, and a random forest.

Pokrovsky Boulevard, 11, S332

18:30-19:50

08/11/2019 Studying the impact of genetic variability on chromatin architecture in humans

Olga Pushkareva, 2nd year student of the master's program "Data Analysis in Biology and Medicine"

One of open problems in computational biology is the assessment of the genetic contribution to the development of complex phenotypic traits. The genome-wide association studies have shown that the majority of disease variants fall into the gene regulatory sequences. However, it is not always the case - for example, some noncoding variants can result in regulatory variations. Moreover, small population studies have shown that only a tiny part of this variation is related to genetics.

The talk will be based on the two studies that aim to characterize the chromatin variability in human lymphoblastoid cell lines (Waszak SM, et al. Cell, 2015 and Kumasaka N, et al. Nature Genetics, 2019) and my current work on application of these two models to the ATAC-seq data of human adipose stromal cells.

Seminar working language – English

Pokrovsky Boulevard, 11, R307

18:10-19:30

01/11/2019 The Attention Mechanism (Transformer): A Potential for Bioinformatic Problems

Jin Seungmin, PHD student (HSE).

The transformer, a popular state-of-the-art deep neural network, can outperform well-known RNN models for sequence data. This model is introduced because RNN based architectures are hard to parallelize and they have difficulties in learning long-range dependencies within input and output sequences. The transformer takes into account all these dependencies using special networks, which may directly access an input space. The core idea behind the transformer model is self-attention — the ability to attend to different positions of the input sequence to compute a representation of that sequence. Transformer creates stacks of self-attention layer using Scaled dot product attention and Multi-head attention. In the presentation, I will introduce the core idea of the model and present its pros and cons based on the example of LA Traffic Jams Analysis. Potential bioinformatics applications will be also discussed.

Seminar working language – English

Pokrovsky Boulevard, 11, R506

18:10-19:30

25/10/2019 ZDNA recognition and mistakes generated by hybrid deep learning models

Nazar Baknazarov, Lab Research Assistant

Regions of the left-handed form of Z-DNA were found in genomes of different species. There is an experimental evidence that Z-DNA plays a role in transcription, chromatin remodeling, and recombination. The association of epigenetic factors with Z-DNA sites remains poorly understood. The aim of this work is to determine the Z-DNA sites in the human genome associated with epigenetic markers with the help of machine learning (ML) models. The effectiveness of convolution, fully connected and recurrent neural networks (CNN, FC RNN) in comparison with base-line machine learning models is investigated. It was shown that convolution networks improve the efficiency of predictions but an addition of recurrent networks to convolution even more considerably increases the model performance. The results demonstrate the practical relevance of deep-learning methods for bioinformatics tasks.

Pokrovsky Boulevard, 11, R506

18:10-19:30

22/10/2019 Search for promoter enrichment with quadruplexes, associated with histone marks

Arina Nostaeva, Lab Research Assistant

We analyzed G4-chip dataset for human genome and epigenetic landscapes in two types of tissues: human stem cells and brain tissue. We found that around 80% of quadruplexes linked with histone marks are shared between both tissues. We performed enrichment analysis and found that promoters with histone marks H3K4Me1, H3K4Me3, H3K9Ac, H3K27Ac are enriched with quadruplexes and depleted with H3K27Me3. When comparing tissues, we observe that for H3K9Ac (active promoter), H4K4me1 (active enhancer) the odds ratio increase twice while moving from stem cells to brain tissues. For H3K27Me3 we observed the opposite transition: brain 0.15, stem cells 0.47 (suppression is 3 times more active in stem cells). We discuss possible mechanisms that underlie the observed phenomena.

Pokrovsky Boulevard, 11, T908

13:00-14:00

04/10/2019 Pharmacogenetic predictors of drug safety: working with real data

The first laboratory workshop this academic year is dedicated to pharmacogenetic predictors of drug safety.

Speaker - Dmitry Ivashchenko, psychiatrist, candidate of medical sciences, FSBEI DPO RMANPO of the Ministry of Health of Russia, researcher at the department of personalized medicine at the Research Institute of MPM, associate professor at the department of child psychiatry and psychotherapy.

Dmitry briefly spoke about the ongoing pharmacogenetic studies of drug safety. The patient's genotype is a risk factor for the development of complications when many drugs are prescribed, even at a standard dose. Identification of risk groups allows you to individually calculate the most effective and safe dose of the drug. The work on the pharmacogenetics of anticoagulants and antiplatelet agents was reviewed.

Pokrovsky Boulevard, 11, D108

19:00-20:00

Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!
To be used only for spelling or punctuation mistakes.

International Laboratory of Bioinformatics

First semester 2019/2020

19/12/2019 Artificial intelligence in biomedicine: how to get a decent salary, publish in Nature magazines and stay ahead of the West without leaving Russia

11/12/2019 Machine vision in the analysis of histological images

05/12/2019 Analysis of the relationship between the secondary organization of DNA, epigenetic markers and structural mutations in oncology

03/12/2019 Geometric deep learning for functional protein design

Video recording of the seminar

21/11/2019 Autoregressive generative models for the generation problem

12/11/2019 Modeling the pharmacogenetics of rivaroxaban using machine learning methods

08/11/2019 Studying the impact of genetic variability on chromatin architecture in humans

01/11/2019 The Attention Mechanism (Transformer): A Potential for Bioinformatic Problems

25/10/2019 ZDNA recognition and mistakes generated by hybrid deep learning models

22/10/2019 Search for promoter enrichment with quadruplexes, associated with histone marks

04/10/2019 Pharmacogenetic predictors of drug safety: working with real data