First semester 2018/2019

20/12/2018 "Using CNN and RNN to predict the functions of DNA sequences"

Beknazarov Nazar, 4th year student of the Educational program "Applied Mathematics and Computer Science"

Topic - "Using CNN and RNN to predict the functions of DNA sequences"

December 20 (Thu), 16: 00-18: 00
3 Kochnovsky Proezd, Aud. 505

11/12/2018 Workshop "Analysis of Genomic Data" Part 2

Anton Zaikin, PhD student, Bioinformatics lab
Michal Rosenwald, 1st year undergraduate student, Bioinformatics lab

The following topics will be disassembled:

Application of machine learning methods for specific tasks;
Python Notebook;
Random Forest, SVM, xFBoost, RNN, CNN

December 11(Tue), 16: 00-18: 00
3 Kochnovsky Proezd, Aud. 511

4/12/18 Workshop "Analysis of Genomic Data" Part 1

Maria Poptsova, Head of Bioinformatics lab, Associate Professor BDIRS
Anton Zaikin, PhD student, Bioinformatics lab

The following topics will be disassembled:

basics of working in a Linux environment;
where to download the genome;
where to download genome annotation by genes, transposons and other functional elements;
how to find annotation intersections;
how to download NGS Encode and The Roadmap Epigenomics;
how to collect machine learning data.

December 4 (Tue), 16: 00-18: 00
3 Kochnovsky Proezd, Aud. 402

20/11/18 "Analysis of the structures of protein complexes with nucleic acids using the NPIDB database and its services"

Sergey Spirin, Leading Researcher at the Bioinformatics lab, Associate Professor at the Faculty of Computer Science.

Topic - "Analysis of the structures of protein complexes with nucleic acids using the NPIDB database and its services''

NPIDB (Nucleic Acids - Protein Interaction Data Base, http://npidb.belozersky.msu.ru/) is a project whose goal is to provide information in a convenient form for a comparative analysis of the structures of DNA-protein and RNA-protein complexes. The source data (the structures themselves) is taken from the PDB, but after that they are processed and reorganized. First of all, contacts between atoms of nucleic acids, on the one hand, and protein, on the other, are revealed. In addition, the complexes are organized into families according to the families of proteins to which the protein part of each complex belongs. It will be told about the base organization, its web-interface, software and planned development.

November 20 (Tue), 16:00-17:50

3 Kochnovsky Proezd, Aud. 314

13/11/2018 "The role of secondary DNA structures in the functioning of the genome"

Maria Poptsova, Head of Bioinformatics lab, Associate Professor BDIRS

Topic - "The role of secondary DNA structures in the functioning of the genome"

Currently, it is known that secondary structures of DNA (cruciform structures, quadruplexes, triplexes, A-form and Z-form DNA) play an important role in a variety of genome functioning processes, including transcription, translation and chromatin organization. Despite the multitude of available experimental data, based on which the genome annotation is constructed by various genomic elements, high-flux experimental installations, which make it possible to detect the secondary structures of DNA and determine their functional role, are under development. However, the first full genomic experiments and genome annotation using computer methods suggests that DNA has tremendous potential to form secondary structures, and experimental data on individual specific DNA structures indicate their regulatory role. In my report, I will talk about the tasks and projects of the DNA punctuation direction of the bioinformatics laboratory, including the search for characteristic patterns of the location of secondary DNA structures and experimentally confirmed functional elements of the genome, as well as the construction of machine learning models for genome annotation by the detected secondary DNA structures.

November 13 (Thu), 16: 00-17: 45
3 Kochnovsky Proezd, Aud. 304

08/11/2018 "The task of studying the three-dimensional structure of chromatin using recurrent neural networks"

Michal Rosenwald, 1st year undergraduate student, Bioinformatics lab

In recent years, the field of application of machine learning methods has expanded significantly. In particular, their use in molecular biology is especially significant. The development of technology allows today to quickly generate a large amount of epigenetic data. New technology Hi-C made it possible to extract data on interactions in the genome, which revealed many of the principles of chromatin folding, including the selection of topologically associated domains (TADs) in the genome. Several studies have confirmed the presence of a correlation between chromatin structure and epigenetic signs.
Michal Rosenwald's research focuses on using machine learning methods to predict the three-dimensional structure of chromatin using epigenetic ChIP-seq data (chromatin markers). Linear models with three types of regularization and architecture of recurrent neural networks are used. As a result, the models were trained and their performance was evaluated. The best results for the weighted mean square error (wMSE) metric were obtained using neural networks.
The most informative epigenetic signs were identified, which makes it possible to evaluate their significance for the formation of the three-dimensional chromatin structure.

November 8(Thu), 16: 00-17: 45
3 Kochnovsky Proezd, Aud. 306

29/10/2018 "Search for association patterns between functional elements of the genome"

Otabek Matkarimov - 1st year undergraduate student, Bioinformatics lab

Denis Polivoda - 1st year undergraduate student, Bioinformatics lab

The problem of finding relationships between various functional annotations of the genome, both experimental and theoretical, is relevant. Existing pattern search programs have significant limitations, most are implemented to work in the Unix system, there is no graphical user interface, and the programs themselves are difficult to use. We have developed a browser-based program in any operating system with a user graphic interface that accepts two genomic annotation files in the .bed format, visualizes the distribution of functional elements in the form of chromosome densities and searches for association patterns between the two genomic elements under study. . The found patterns are visualized, and information about their location is given in the form of a list. This program is designed to solve a wide class of bio-informatics problems of searching for patterns of association between various functional annotations of the genome.

October 29 (Mon), 16: 00-17: 45
3 Kochnovsky Proezd, Aud. 304

08/10/2018 "Features of the form of DNA in the zone of contact with DNA-recognizing proteins"

Alexandr Scherbakov - 2nd year undergraduate student, Bioinformatics lab

Parameters of the “DNA form”, which are read by DNA-recognizing proteins, can enhance their specificity when interacting with DNA. However, how strong is the effect of these parameters? Can we say that related DNA-recognizing proteins tend to interact with DNA that has similar “shape” parameters, and can we, using these parameters, somehow judge the affinity of proteins? Using data from The Nucleic acid – Protein Interaction DataBase (NPIDB) and mathematical statistics tools, we will try to shed light on these issues.

Nikolai Butenko - 2nd year undergraduate student, Bioinformatics lab

The program of multiple structural alignment of DNA-protein complexes and its application to the identification of conservative peculiarities of DNA binding by proteins from widely distributed families of DNA-recognizing proteins

October 08 (Mon), 16: 00-17: 45
3 Kochnovsky Proezd, Aud. 304

01/10/2018 "Multiplex enhancer-reporter assays uncover unsophisticated logic of TP53 cis-regulatory modules"

Transcription factors regulate their target genes by binding to regulatory regions in the genome. Although the binding preferences of TP53 are known, it remains unclear what distinguishes functional enhancers from nonfunctional binding. In addition, the genome is scattered with recognition sequences that remain unoccupied. Using multiplex enhancer-reporter assays coupled with machine learning methods we discovered that functional enhancers could be discriminated from nonfunctional binding events by the occurrence of a single TP53 canonical motif. By combining machine learning with a meta-analysis of TP53 ChIP-seq data sets, we identified a core set of more than 1000 responsive enhancers in the human genome. This TP53 cistrome is invariably used between cell types and experimental conditions, whereas differences among experiments can be attributed to indirect nonfunctional binding events. Our data suggest that TP53 enhancers represent a class of unsophisticated cell-autonomous enhancers containing a single TP53 binding site, distinct from complex developmental enhancers that integrate signals from multiple transcription factors.

October 01 (Mon), 16: 00-17: 45
3 Kochnovsky Proezd, Aud. 510

24/09/2018 "Machine-Learning Models to Recognize Patterns of Nucleosome and DNA Structures Positioning"

Non-B DNA structures have a great potential to form and influence various genomic processes including transcription. One of the mechanisms of transcription regulation is nucleosome positioning. Even though only B-DNA can be wrapped around a nucleosome, non-B DNA structures can compete with a nucleosome for a genomic location. Here we used permanganate/S1 nuclease footprinting data on non-B DNA structures, such as Z-DNA, H-DNA, G-quadruplexes and stress-induced duplex destabilization (SIDD) sites, together with MNase-seq data on nucleosome positioning in the mouse genome. We found three types of patterns of nucleosome positioning around non-B DNA structures: a structure is surrounded by nucleosomes from both sides, from one side, or nucleosome free region. Machine learning models based on random forest and XGBoost algorithms were constructed to recognize DNA regions of 1kB length containing a particular pattern of nucleosome positioning for four types of DNA structures (Z-DNA, H-DNA, G-quadruplexes and SIDD sites) based on statistics of di- and tri-nucleotides. The best performance (94% of accuracy) was reached for G-quadruplexes while for other types of structures the accuracy was under 70%. We conclude that 1kB regions containing G-quadruplexes have distinct compositional properties, and this fact points to preferential locations of such pattern in the genome and requires further investigation. For other DNA structures a region composition is not a sufficient predictive factor and one should take into account other physical and structural DNA properties to improve nucleosome-DNA-structure pattern recognition.

September 24 (Mon), 16: 00-17: 45
3 Kochnovsky Proezd, Aud. 510

17/09/2018 Article discussion "Folded DNA in Action"

Article Discussion "Folded DNA in Action: Hairpin Formation and Biological Functions in Prokaryotes." (Bikard D, Loot C, Baharoglu Z, Mazel D. Microbiol Mol Biol Rev. 2010 Dec; 74(4):570-88.)

Abstract: Structured forms of DNA with intrastrand pairing are generated in several cellular processes and are involved in biological functions. These structures may arise on single-stranded DNA (ssDNA) produced during replication, bacterial conjugation, natural transformation, or viral infections. Furthermore, negatively supercoiled DNA can extrude inverted repeats as hairpins in structures called cruciforms. Whether they are on ssDNA or as cruciforms, hairpins can modify the access of proteins to DNA, and in some cases, they can be directly recognized by proteins. Folded DNAs have been found to play an important role in replication, transcription regulation, and recognition of the origins of transfer in conjugative elements. More recently, they were shown to be used as recombination sites. Many of these functions are found on mobile genetic elements likely to be single stranded, including viruses, plasmids, transposons, and integrons, thus giving some clues as to the manner in which they might have evolved. We review here, with special focus on prokaryotes, the functions in which DNA secondary structures play a role and the cellular processes giving rise to them. Finally, we attempt to shed light on the selective pressures leading to the acquisition of functions for DNA secondary structures.

Article (PDF, 2,94 Мб)

September 17 (Mon), 16: 00-17: 45
3 Kochnovsky Proezd, Aud. 510

10/09/2018 "Interns' reports"

Anton Zaikin, PhD student, Bioinformatics lab

"Stem loop structures in bacteria promoters"

Ekaterina Tikhonova, 2nd year undergraduate student, Bioinformatics lab

"Pangenomes of bacteria of the genus Bacillus"

September 10 (Mon), 16: 00-17: 45
3 Kochnovsky Proezd, Aud. 510

Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!
To be used only for spelling or punctuation mistakes.

International Laboratory of Bioinformatics