The Lab helds invited talks on NLP, Recommender Systems, Data Mining, and related topics twice a month. For more details see the Russian page.
Seminar Automatic Processing and Analysis of Texts & rdquo; dedicated to various processing tasks (tokenization, recovery segmentation, part-of-speech markup and syntactic parsing) and textual information parsing (extraction tasks information, construction and use of knowledge graphs, construction of question-answer systems, text classification, etc.).
Seminar "Neural entity linking using graph embeddings"
Date: December 23, 2019.
Speaker: Özge Sevgili Ergüven (Language Technology Group, University of Hamburg) .
Annotation: Entity Disambiguation (ED) is the task of linking an ambiguous entity mention to a corresponding entry in a knowledge base. Current methods have mostly focused on unstructured text data to learn representations of entities, however, there is structured information in the knowledge base itself that should be useful to disambiguate entities. In this work, we propose a method that uses graph embeddings for integrating structured information from the knowledge base with unstructured information from text-based representations. Our experiments confirm that graph embeddings trained on a graph of hyperlinks between Wikipedia articles improve the performances of simple feed-forward neural ED model and a state-of-the-art neural ED system.
Joint seminar of the Research and Educational Laboratory of Models and Methods of Computational Pragmatics and the Speech & Language Laboratory of Huawei
Date: December 18, 2019
During the seminar, two reports were made on topical issues:
1. Pavel Braslavsky (HSE St. Petersburg / UrFU / JetBrains Research) spoke about automatic analysis and generation of humor with an overview report:
2. Mikhail Kudinov (Huawei Research) presented a brief overview of speech generation technologies:
Seminar "Methods of using structured knowledge sources in automatic text processing tasks"
Date: November 16, 2019
Speakers: Mikhail Galkin (Fraunhofer IAIS, Dresden), Andrey But (Huawei Noah’s Ark Lab), Dmitry Puzyrev (trainee researcher at the MMVP Laboratory)
Abstract: Mikhail Galkin gave a review report on question-answering systems using knowledge bases. Dmitry Puzyrev presented his own research on the applicability of hyperbolic vector representations of words in the problem of determining the compositionality of a noun phrase, Andrey But held a retrospective of the recently completed EMNLP conference.
Seminar "Discourse Analysis in Automatic Processing Tasks"
Date: November 14, 2019
Speaker: Elena Chistova
Abstract: Many NLP tasks require text analysis beyond a single sentence. One of the most widely used theories to describe the discourse structure of a text is Rhetorical Structure Theory (RST). In it, the text is presented in the form of a tree of components containing relationships (development, reason, background, etc.) between text segments. The report presented the results of experiments to create a discourse analyzer based on the RuRSTreebank corpus, containing markup of Russian-language texts of several genres.
As part of the seminar, a round table was held on the use of discourse analysis in automatic text processing tasks. The participants of the round table are research fellows from the National Research Institute Higher School of Economics and the Federal Institute of Management of the Russian Academy of Sciences.
Seminar "Evolution of the semantics of words over time and distribution methods"
Date: October 24, 2019
Speaker: Andrey Kutuzov, University of Oslo
Abstract: Distributive semantic vector models (word embeddings) have proven themselves in detecting diachronic semantic shifts. As part of SemEval-2020, a competition on this topic was held. Most likely, most participants use distributional approaches in one way or another. The speaker briefly spoke about the results achieved in this area: he demonstrated publicly available, manually labeled test sets for the Russian language, as well as well-working algorithms for detecting semantic shifts using diachronic embeddings. There was also a discussion of some common errors when using distribution models. The seminar ended with a joint discussion of the tasks proposed by the organizers of SemEval-2020.
Seminar "Competition for solving school tests in the Russian language and a basic solution for it"
Date: October 3, 2019
Speaker: Valentin Malykh (researcher at Huawei Noah's Ark lab)
Abstract: The report talks about the ongoing competition https://contest.ai-journey.ru, what its complexity is and why it is interesting. A basic solution for this competition that would score a C on the real exam is considered. If you are interested, then maybe you will be able to make an excellent decision.
Seminar "Determination of the sentiment of aspect categories in the Russian language"
Date: September 19, 2019
Speakers: Ilya Sochenkov (head of department, Federal Research Center Institute of Management of the Russian Academy of Sciences), Philip Furaev and Nikita Borovkov (Skoltech-SUAP students).
Abstract: The report presented a method for automatically tagging a Yandex Market dataset (using the example of reviews of mobile phones) according to the polarities of aspect categories using ratings of the most commonly used words to describe product categories. Category refers to the most characteristic properties of a product. For example, for mobile phones: screen, battery... The report discusses the use of several machine learning models to solve problems and provides the results of comparative assessments of the quality of their work.
Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!
To be used only for spelling or punctuation mistakes.