Publications
Only a small proportion of patients with cancer show lasting responses to immune checkpoint blockade (ICB)-based monotherapies. The RNA-editing enzyme ADAR1 is an emerging determinant of resistance to ICB therapy and prevents ICB responsiveness by repressing immunogenic double-stranded RNAs (dsRNAs), such as those arising from the dysregulated expression of endogenous retroviral elements (EREs)1,2,3,4. These dsRNAs trigger an interferon-dependent antitumour response by activating A-form dsRNA (A-RNA)-sensing proteins such as MDA-5 and PKR5. Here we show that ADAR1 also prevents the accrual of endogenous Z-form dsRNA elements (Z-RNAs), which were enriched in the 3′ untranslated regions of interferon-stimulated mRNAs. Depletion or mutation of ADAR1 resulted in Z-RNA accumulation and activation of the Z-RNA sensor ZBP1, which culminated in RIPK3-mediated necroptosis. As no clinically viable ADAR1 inhibitors currently exist, we searched for a compound that can override the requirement for ADAR1 inhibition and directly activate ZBP1. We identified a small molecule, the curaxin CBL0137, which potently activates ZBP1 by triggering Z-DNA formation in cells. CBL0137 induced ZBP1-dependent necroptosis in cancer-associated fibroblasts and reversed ICB unresponsiveness in mouse models of melanoma. Collectively, these results demonstrate that ADAR1 represses endogenous Z-RNAs and identifies ZBP1-mediated necroptosis as a new determinant of tumour immunogenicity masked by ADAR1. Therapeutic activation of ZBP1-induced necroptosis provides a readily translatable avenue for rekindling the immune responsiveness of ICB-resistant human cancers.
Today, hundreds of prokaryotic species are able to synthesize chlorophyll and cobalamin (vitamin B12). An important step in the biosynthesis of these coenzymes is the insertion of a metal ion into a porphyrin ring. Namely, Mg-chelatase ChlIDH and aerobic Co-chelatase CobNST are utilized in the chlorophyll and vitamin B12 pathways, respectively. The corresponding subunits of these enzymes have common evolutionary origin. Recently, we have identified a highly conserved frameshifting signal in the chlD gene. This unusual regulatory mechanism allowed production of both the small and the medium chelatase subunits from the same gene. Moreover, the chlD gene appeared early in the evolution and could be at the starting point in the development of the chlorophyll and B12 pathways. Here, we studied the possible coevolution of these two pathways through the analysis of the chelatase genes. To do that, we developed a specialized Web database with comprehensive information about more than 1200 prokaryotic genomes. Further analysis allowed us to split the coevolution of the chlorophyll and B12 pathway into eight distinct stages
Understanding mechanisms of cancer breakpoint mutagenesis is a difficult task and predictive models of cancer breakpoint formation have to this time failed to achieve even moderate predictive power. Here we take advantage of a machine learning approach that can gather important features from big data and quantify contribution of different factors. We performed comprehensive analysis of almost 630,000 cancer breakpoints and quantified the contribution of genomic and epigenomic features–non-B DNA structures, chromatin organization, transcription factor binding sites and epigenetic markers. The results showed that transcription and formation of non-B DNA structures are two major processes responsible for cancer genome fragility. Epigenetic factors, such as chromatin organization in TADs, open/closed regions, DNA methylation, histone marks are less informative but do make their contribution. As a general trend, individual features inside the groups show a relatively high contribution of G-quadruplexes and repeats and CTCF, GABPA, RXRA, SP1, MAX and NR2F2 transcription factors. Overall, the cancer breakpoint landscape can be represented by well-predicted hotspots and poorly predicted individual breakpoints scattered across genomes. We demonstrated that hotspot mutagenesis has genomic and epigenomic factors, and not all individual cancer breakpoints are just random noise but have a definite mutation signature. Besides we found a long-range action of some features on breakpoint mutagenesis. Combining omics data, cancer-specific individual feature importance and adding the distant to local features, predictive models for cancer breakpoint formation achieved 70–90% ROC AUC for different cancer types; however precision remained low at 2% and the recall did not exceed 50%. On the one hand, the power of models strongly correlates with the size of available cancer breakpoint and epigenomic data, and on the other hand finding strong determinants of cancer breakpoint formation still remains a challenge. The strength of predictive signals of each group and of each feature inside a group can be converted into cancer-specific breakpoint mutation signatures. Overall our results add to the understanding of cancer genome rearrangement processes.
Introduction: Accurate detection of GATA1 mutation is highly significant in patients with acute myeloid leukemia (AML) and trisomy 21 as it allows optimization of clinical protocol. This study was aimed at (a) enhanced search for GATA1 mutations; and (b) characterization of molecular landscapes for such conditions.
Methods: The DNA samples from 44 patients with newly diagnosed de novo AML with trisomy 21 were examined by fragment analysis and Sanger sequencing of the GATA1 exon 2, complemented by targeted high-throughput sequencing (HTS).
Results: Acquired GATA1 mutations were identified in 43 cases (98%). Additional mutations in the genes of JAK/STAT signaling, cohesin complex, and RAS pathway activation were revealed by HTS in 48%, 36%, and 16% of the cases, respectively.
Conclusions: The GATA1 mutations were reliably determined by fragment analysis and/or Sanger sequencing in a single PCR amplicon manner. For patients with extremely low blast counts and/or rare variants, the rapid screening with simple molecular approaches must be complemented with HTS. The JAK/STAT and RAS pathway-activating mutations may represent an extra option of targeted therapy with kinase inhibitors.
Nucleosomes are elementary building blocks of chromatin in eukaryotes. They tightly wrap ∼147 DNA base pairs around an octamer of histone proteins. How nucleosome structural dynamics affect genome functioning is not completely clear. Here we report all-atom molecular dynamics simulations of nucleosome core particles at a timescale of 15 micro- seconds. At this timescale, functional modes of nucleosome dynamics such as spontaneous nucleosomal DNA breathing, unwrapping, twisting, and sliding were observed. We identified atomistic mechanisms of these processes by analyzing the accompanying structural rear- rangements of the histone octamer and histone-DNA contacts. Octamer dynamics and plasticity were found to enable DNA unwrapping and sliding. Through multi-scale modeling, we showed that nucleosomal DNA dynamics contribute to significant conformational variability of the chromatin fiber at the supranucleosomal level. Our study further supports mechanistic coupling between fine details of histone dynamics and chromatin functioning, provides a framework for understanding the effects of various chromatin modifications.
At the cellular level, cancer is the disease of both the genome and the epigenome, and the interplay between genetic mutations and epigenetic states may occur at the level of elementary chromatin units, the nucleosomes. They are formed by a segment of DNA wrapped around an octamer of histone proteins. In this review, we survey various mechanisms of cancer etiology and progression mediated by histones and nucleosomes. In particular, we discuss the effects of mutations in histones, changes in their expression and slicing on epigenetic dysregulation and carcinogenesis. The links between cancer phenotypes and differential expression of histone variants and isoforms are summarized. Finally, we discourse the geometric and steric effects of DNA compaction in nucleosomes on DNA mutation rate, interactions with transcription factors, including pioneer transcription factors, and prospects of cancer cells’ genome and epigenome editing.
Cancer genomes are susceptible to multiple rearrangements by deleting, inserting, and translocating genomic regions. Recently, the problem of finding determinants of breakpoint formations was approached with machine learning methods; however, unlike cancer point mutations, breakpoint prediction appeared to be a more difficult task, and various machine learning models did not achieve high prediction power often slightly exceeding the threshold of random guessing. This raised the question of whether the breakpoints are random noise in cancer mutagenesis or there exist determinants in structural mutagenesis. In the present study, we investigated randomness in cancer breakpoint genome distributions through the power of machine learning models to predict breakpoint hot spots. We divided all cancer types into three groups by degree of randomness in their breakpoint formation. We tested different density thresholds and explored the bias in hot spot definition. We also compared prediction of hot spots versus individual breakpoints. We found that hot spots are considerably better predicted than individual breakpoints; however, some individual breakpoints can also be predicted with a satisfactory power, and thus, it is not proper to filter them from analyses. We demonstrated that positive-unlabeled learning can provide insights into insufficiency of cancer data sets, which are not always reflected by data set sizes. Overall, the present results support the view that cancer breakpoint landscape can be represented by predictable dense breakpoint regions and scattered individual breakpoints, which are not all random noise, but some are generated by detectable mechanism.
It is now difficult to believe that a biological function for the left-handed Z-DNA and Z-RNA conformations was once controversial. The papers in this Special Issue, "Z-DNA and Z-RNA: from Physical Structure to Biological Function", are based on presentations at the ABZ2021 meeting that was held virtually on 19 May 2021 and provide evidence for several biological functions of these structures. The first of its kind, this international conference gathered over 200 scientists from many disciplines to specifically address progress in research involving Z-DNA and Z-RNA. These high-energy left-handed conformers of B-DNA and A-RNA are associated with biological functions and disease outcomes, as evidenced from both mouse and human genetic studies. These alternative structures, referred to as "flipons", form under physiological conditions, regulate type I interferon responses and induce necroptosis during viral infection. They can also stimulate genetic instability, resulting in adaptive evolution and diseases such as cancer. The meeting featured cutting-edge science that was, for the most part, unpublished. We plan for the ABZ meeting to reconvene in 2022.
Magnesium chelatase chlIDH and cobalt chelatase cobNST enzymes are required for biosynthesis of (bacterio)chlorophyll and cobalamin (vitamin B12), respectively. Each enzyme consists of large, medium, and small subunits. Structural and primary sequence similarities indicate common evolutionary origin of the corresponding subunits. It has been reported earlier that some of vitamin B12 synthesizing organisms utilized unusual cobalt chelatase enzyme consisting of a large cobalt chelatase subunit (cobN) along with a medium (chlD) and a small (chlI) subunits of magnesium chelatase. In attempt to understand the nature of this phenomenon, we analyzed >1,200 diverse genomes of cobalamin and/or chlorophyll producing prokaryotes. We found that, surprisingly, genomes of many cobalamin producers contained cobN and chlD genes only; a small subunit gene was absent. Further on, we have discovered a diverse group of chlD genes with functional programed ribosomal frameshifting signals. Given a high similarity between the small subunit and the N- terminal part of the medium subunit, we proposed that programed translational frameshifting may allow chlD mRNA to produce both subunits. Indeed, in genomes where genes for small subunits were absent, we observed statistically significant enrichment of programed frameshifting signals in chlD genes. Interestingly, the details of the frameshifting mechanisms producing small and medium subunits from a single chlD gene could be prokaryotic taxa specific. All over, this programed frameshifting phenomenon was observed to be highly conserved and present in both bacteria and archaea. We would like to thank Mark Borodovsky, Pavel Baranov and John Atkins for the important contributions to the original discovery of the PRF in the chelatase gene. This work was supported in part by the RSF grant № 20-74-00128 to IA
Genome rearrangement is a hallmark of all cancers. Cancer breakpoint prediction appeared to be a difficult task, and various machine learning models did not achieve high prediction power. We investigated the power of machine learning models to predict breakpoint hotspots selected with different density thresholds and also compared prediction of hotspots versus individual breakpoints. We found that hotspots are considerably better predicted than individual breakpoints. While choosing a selection criterion, the test ROC AUC only is not enough to choose the best model, the lift of recall and lift of precision should be taken into consideration. Investigation of the lift of recall and lift of precision showed that it is impossible to select one criterion of hotspot selection for all cancer types but there are three to four distinct groups of cancer with similar properties. Overall the presented results point to the necessity to choose different hotspots selection criteria for different types of cancer.
Computational methods to predict Z-DNA regions are in high demand to understand the functional role of Z-DNA. The previous state-of-the-art method Z-Hunt is based on statistical mechanical and energy considerations about B- to Z-DNA transition using sequence information. Z-DNA CHiP-seq experiment results showed little overlap with Z-Hunt predictions implying that sequence information only is not sufficient to explain emergence of Z-DNA at different genomic locations. Adding epigenetic and other functional genomic mark-ups to DNA sequence level can help revealing the functional Z-DNA sites. Here we take advantage of the deep learning approach that can analyze and extract information from large volumes of molecular biology data. We developed a machine learning approach DeepZ that aggregates information from genome-wide maps of epigenetic markers, transcription factor and RNA polymerase binding sites, and chromosome accessibility maps. With the developed model we not only verify the experimental Z-DNA predictions, but also generate the whole-genome annotation, introducing new possible Z-DNA regions, which have not yet been found in experiments and can be of interest to the researchers from various fields.
Computational methods to predict Z-DNA regions are in high demand to understand the functional role of Z-DNA. The previous state-of-the-art method Z-Hunt is based on statistical mechanical and energy considerations about B- to Z-DNA transition using sequence information. Z-DNA CHiP-seq experiment results showed little overlap with Z-Hunt predictions implying that sequence information only is not sufficient to explain emergence of Z-DNA at different genomic locations. Adding epigenetic and other functional genomic mark-ups to DNA sequence level can help revealing the functional Z-DNA sites. Here we take advantage of the deep learning approach that can analyze and extract information from large volumes of molecular biology data. We developed a machine learning approach DeepZ that aggregates information from genome-wide maps of epigenetic markers, transcription factor and RNA polymerase binding sites, and chromosome accessibility maps. With the developed model we not only verify the experimental Z-DNA predictions, but also generate the whole-genome annotation, introducing new possible Z-DNA regions, which have not yet been found in experiments and can be of interest to the researchers from various fields.
The aim of this study was to determine the prevalence of CYP2C9, VKORC1, CYP2C19, ABCB1, CYP2D6 and SLCO1B1 genes polymorphisms among residents of the Volga region (Chuvash and Mari) and northern Caucasus (Kabardins and Ossetians). Materials & methods: The study involved 845 apparently healthy volunteers of both sexes of the four different ethnic groups living in the Russian Federation: 238 from the Chuvash ethnic group, 206 Mari, 157 Kabardins and 244 Ossetians. Results: Significant differences were identified in allele frequency of CYP2C9, VKORC1, CYP2C19, ABCB1, CYP2D6 and SLCO1B1 genes polymorphisms between the Chuvash and Kabardins, Chuvash and Ossetians, Mari and Kabardians, Mari and Ossetians.
Background
Eukaryotic protein-coding genes consist of exons and introns. Exon–intron borders are conserved between species and thus their changes might be observed only on quite long evolutionary distances. One of the rarest types of change, in which intron relocates over a short distance, is called "intron sliding", but the reality of this event has been debated for a long time. The main idea of a search for intron sliding is to use the most accurate genome annotation and genome sequence, as well as high-quality transcriptome data. We applied them in a search for sliding introns in mammals in order to widen knowledge about the presence or absence of such phenomena in this group.
Results
We didn’t find any significant evidence of intron sliding in the primate group (human, chimpanzee, rhesus macaque, crab-eating macaque, green monkey, marmoset). Only one possible intron sliding event supported by a set of high quality transcriptomes was observed between EIF1AX human and sheep gene orthologs. Also, we checked a list of previously observed intron sliding events in mammals and showed that most likely they are artifacts of genome annotations and are not shown in subsequent annotation versions as well as are not supported by transcriptomic data.
Conclusions
We assume that intron sliding is indeed a very rare evolutionary event if it exists at all. Every case of intron sliding needs a lot of supportive data for detection and confirmation.
Over the last 20 years whole-genome sequencing of cancer genomes supported the phenomenon of cancer mutation heterogeneity both for point and structural variants. Alongside with the whole-genome sequencing projects many next-generation sequencing experiments including ChIP-seq for histone modifications and transcription factors, DNase-seq, MeDIP-Seq, Hi-C, and others were collected for thousands of cancer genomes. Machine learning approach became an efficient method of predictive modeling because machine learning algorithms are able to consider multiple factors and their interactions and range them in an order of importance. Machine learning models, predicting cancer point mutations at 1Mb scale and using as predictors state of the chromatin, epigenetic factors and non-B DNA structures, achieved a good predictive power. However, predicting cancer breakpoints appeared to be a more difficult task than predicting point mutations. Machine learning models, that were successfully used to predict cancer point mutations, using the same features, could not achieve high performance in predicting cancer breakpoints. Nevertheless, the available models demonstrate that aggregating information from omics experiments increases the model prediction power. Here we review state-of-the art machine learning approaches to predict cancer breakpoints and discuss current understanding of the determinants of cancer breakpoint formation.
BACKGROUND: Chlamydia are ancient intracellular pathogens with reduced, though strikingly conserved genome. Despite their parasitic lifestyle and isolated intracellular environment, these bacteria managed to avoid accumulation of deleterious mutations leading to subsequent genome degradation characteristic for many parasitic bacteria. RESULTS: We report pan-genomic analysis of sixteen species from genus Chlamydia including identification and functional annotation of orthologous genes, and characterization of gene gains, losses, and rearrangements. We demonstrate the overall genome stability of these bacteria as indicated by a large fraction of common genes with conserved genomic locations. On the other hand, extreme evolvability is confined to several paralogous gene families such as polymorphic membrane proteins and phospholipase D, and likely is caused by the pressure from the host immune system. CONCLUSIONS: This combination of a large, conserved core genome and a small, evolvable periphery likely reflect the balance between the selective pressure towards genome reduction and the need to adapt to escape from the host immunity.
Aromatic compounds are a common carbon and energy source for many microorganisms, some of which can even degrade toxic chloroaromatic xenobiotics. This comparative study of aromatic metabolism in 32 Betaproteobacteria species describes the links between several transcription factors (TFs) that control benzoate (BenR, BenM, BoxR, BzdR), catechol (CatR, CatM, BenM), chlorocatechol (ClcR), methylcatechol (MmlR), 2,4-dichlorophenoxyacetate (TfdR, TfdS), phenol (AphS, AphR, AphT), biphenyl (BphS), and toluene (TbuT) metabolism. We characterize the complexity and variability in the organization of aromatic metabolism operons and the structure of regulatory networks that may differ even between closely related species. Generally, the upper parts of pathways, rare pathway variants, and degradative pathways of exotic and complex, in particular, xenobiotic compounds are often controlled by a single TF, while the regulation of more common and/or central parts of the aromatic metabolism may vary widely and often involves several TFs with shared and/or dual, or cascade regulation. The most frequent and at the same time variable connections exist between AphS, AphR, AphT, and BenR. We have identified a novel LysR-family TF that regulates the metabolism of catechol (or some catechol derivative) and either substitutes CatR(M)/BenM, or shares functions with it. We have also predicted several new members of aromatic metabolism regulons, in particular, some COGs regulated by several different TFs.
DNAsecondary structures are important functional elements thatmay influence cellular processes. One of theirpossible functions is regulation of nucleosome positioning. Here MNAse-seq and ssDNA-seq data were used to define patterns of positional relationship of DNA structures such as Z-DNA, H-DNA and G-quadruplexes with nucleosomes. Three types of patterns werefound: a structure is surrounded by nucleosomes from both sides, from one side, or nucleosome free region. Machine-learning models based on Random forest algorithm and XGBoost weretrained to recognize DNA region of 500 bp length containing a pattern of nucleosome positioning for three types of DNA struc-tures (Z-DNA, H-DNA and G-quadruplexes) based on DNAsequence composi-tional properties. The best performance (more than 86% for ROC-AUC, accu-racy, recall and presicion scores) wasreached for G-quadruplexes. 500 bp re-gions containing G-quadruplexes have distinct compositional properties and point to the preferential locations of the defined patterns, which regulatory functions require further investigation. For other DNA structures a region com-position is less powerful predictive factor and one should take into account oth-er physical and structural DNA properties to improve nucleosome-DNA-structure pattern recognition.