• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Seminars

The Lab helds invited talks on NLP, Recommender Systems, Data Mining, and related topics twice a month. For more details see the Russian page.

 

Seminar Automatic Processing and Analysis of Texts & rdquo; dedicated to various processing tasks (tokenization, recovery segmentation, part-of-speech markup and syntactic parsing) and textual information parsing (extraction tasks information, construction and use of knowledge graphs, construction of question-answer systems, text classification, etc.).

Online Seminar "Entropy Approach in Topical Modeling"
Date: 5 November 2020
Speakers:  Sergey Koltsov , Leading Researcher, Laboratory of Social and Cognitive Informatics, Associate Professor, Department of Mathematics, & nbsp;
Vera Ignatenko, Researcher, Laboratory of Social and Cognitive Informatics, Associate Professor, Department of Mathematics. .
Annotation: The report will consider the possibilities of using deformed entropies (Renyi, Tsallis, Sharma-Mittal entropies) to analyze the behavior of a number of thematic models (TM ). The report describes an approach to the analysis of the dependence of TM on the number of topics based on ideas from statistical physics. Within the framework of this approach, a collection of documents and words is considered in the form of a mesoscopic information system, the state of which is described by deformed entropies, and the behavior of the information system is determined by the number of clusters / topics. Thematic modeling is considered as a procedure for ordering information systems. Proceeding from this, the problem of choosing the optimal number of topics can be reduced to the problem of finding the minimum free energy or the minimum of the nonequilibrium Renyi / Tsallis entropy, and the search for semantic stability can be determined using the Sharma-Mittal entropy. Within the framework of this report, it will be shown how you can organize the setting of hyper-parameters of thematic models in terms of entropy, as with the help of enumeration of hyper & ndash; parameters on the grid, and using renormalization procedures. The procedure for renormalizing topic models can significantly speed up the application of the entropy approach from a computational point of view, which is extremely important when working with big data. This paper will also consider the possibility of applying the entropy approach to hierarchical thematic models, and discuss the limitations of this approach. In addition, the report will present the results of calculations of such thematic models as PLSA, VLDA (Blay), LDA (Gibbs sampling), GLDA (Gibbs sampling), BigARTM; results of application of renormalization procedures, as well as results of calculations of several hierarchical thematic models (HPAM, HLDA, hARTM).

Online Seminar "
Machine Reading Comprehension and Russian Language"

Date: September 17, 2020
Speakers: Pavel Efimov earned his Master degree in Computer Science at Saint Petersburg State University. Now he is a PhD student at ITMO University. & Nbsp;
Annotation: First, I will briefly survey machine reading comprehension (RC) and its flavors, as well as methods and datasets used to leverage the task. Then I will focus on RC datasets for non-English languages. & Nbsp; I will pay special attention to Russian RC dataset & mdash; Sberbank Question Answering Dataset (SberQuAD). SberQuAD has been widely used since its inception in 2017, but it hasn't been described and analyzed properly in the literature until recently. In my presentation, I will provide a thorough analysis of SberQuAD and report several baselines.

Online Seminar "RussianSuperGLUE"

Date: September 3, 2020
Presenter: Alena Fenogenova. Chief specialist NLP R & amp; D, CDS office, Sberbank
Annotation:
This talk presented a large benchmark for evaluating language models & ndash; Russian Sup


 

Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!
To be used only for spelling or punctuation mistakes.