The Lab helds invited talks on NLP, Recommender Systems, Data Mining, and related topics twice a month. For more details see the Russian page.
Seminar Automatic Processing and Analysis of Texts & rdquo; dedicated to various processing tasks (tokenization, recovery segmentation, part-of-speech markup and syntactic parsing) and textual information parsing (extraction tasks information, construction and use of knowledge graphs, construction of question-answer systems, text classification, etc.).
Online - seminar "Modeling and Predicting Helpfulness of Online reviews"
Date: 2 февраля 2023 г.
Speaker: Muhammad Shahid Iqbal Malik, Research Fellow:Faculty of Computer Science / Laboratory for Models and Methods of Computational Pragmatics.
Annotation: Consumers prefer reading online reviews before making their buying decisions. However, it is challenging to discern the best online reviews due to the large volume of online reviews for some products. In this regard, the helpfulness characteristic of online reviews is effective in dealing with information overload and supports consumers in their decision-making process. In this talk, I will share my contributions in developing predictive models for helpfulness of product reviews. I addressed helpfulness in two prospective: Binary classification and Regression model. Several newly-proposed review content, reviewer, and product features are investigated. Specifically, state-of-the-art semantic, word contextual embedding and language models are explored. Likewise, popular machine learning, ensemble and deep learning models are also utilized to build the effective frameworks. Accordingly, our work could be of value to the research community concerned by identifying what makes a review helpful or not helpful by uncovering the importance of new indicators that sheds light on the empirical relationship between these variables and review helpfulness. Additionally, our work has important implications for marketing professionals and retailer platforms that can utilize our results to optimize their customer feedback systems, enhance reviewer guidelines, and include more useful product reviews.
Seminar "Prediction of structural objects based on formal contexts and natural language data"
Date: March 30, 2023
Speaker: Ramil Yarullin, 4th year graduate student and teacher at the Department of Big Data and Information Retrieval at the Faculty of Computer Science. The research that will be discussed in the report was done within the framework of a graduate school, work at Yandex and the Yandex scientific and educational laboratory at the Faculty of Computer Science.
Abstract: The report will consider several problems of constructing and predicting structural objects based on data - from formal contexts to text data in natural language. The first part of the report will talk about theoretical work devoted to the construction of an approximate probabilistic basis of implications for formal contexts. The second part will focus on natural language text contexts and an approach to the problem of text classification with overlapping classes by generating a sequence of class labels. In particular, we will consider the problem statement with the existing hierarchical class structure and discuss a method that combines the standard architecture of the BERT model and the approach with sequential label prediction. In the third part of the talk, we will move on to the task of answering numerical questions in text and table contexts, where answering the question requires the sequential application of various discrete operations, such as counting, comparing numbers, sorting, and performing arithmetic expressions. We will talk about a new neural network model, which currently shows the best results in this task.
Preliminary defense of the candidate's dissertation of Mikhailov Vladislav
On April 26, 2023 at 15:00, at a meeting of the Department of Big Data and Information Retrieval of the Faculty of Computer Science, a preliminary defense of Vladislav Nikolaevich Mikhailov’s thesis took place on the topic: “Benchmark testing of language models on problems of understanding natural language” (scientific specialty 1.2.1 Artificial intelligence and machine learning).
Scientific supervisor: Ph.D. Sciences, Artyomova Ekaterina Leonidovna, Postdoctoral Researcher, LMU Munich
— Ekaterina Loginova, Doctor of Business Economics, Ghent University
— Maxim Panov, Ph.D. Sc., Senior Researcher, Technology Innovation Institute, Abu Dhabi, UAE
Seminar “You Told Me That Joke Twice: A Systematic Investigation of Transferability and Robustness of Humor Detection Models”
Date: October 11, 2023
Speaker: Alexander Baranov, graduate student of the Department of Big Data and Information Retrieval of the Faculty of Computer Science.
Abstract: In this research presentation, we investigate the important field of automatic humor detection within conversational AI. Although there are multiple English humor datasets, there is little information about the generalization and real-world behavior of models trained on them. Our thorough analysis includes examining existing datasets, training RoBERTa-based classifiers on each one, and carrying out extensive cross-dataset testing. Additionally, we use large language models (LLMs) to perform thorough testing and evaluate their effectiveness in detecting humour.
Seminar "Application of machine learning methods to incomplete and noisy data"
Date: October 17 at 18:00
Speaker: Arseny Sotsky, graduate student of the Faculty of Mathematics.
Abstract: We continue the series of short presentations by graduate students. 1st year graduate student of the Faculty of Mathematics, Arseniy Sotsky, will present a report on the application of machine learning methods to incomplete and noisy data. The report examines various ways of processing and analyzing data, much of which is missing, using the example of a medical database. In addition, on a similar topic, Arseny co-authored work at the Hydrometeorological Center and published a preprint (https://arxiv.org/abs/2306.14318) on the use of ensemble Kalman filters to predict local spectra of non-stationary random processes on a sphere under the condition of incomplete data.
The international workshop on Data and Computation for Materials Science and Innovation (DACOMSIN)
Place: HSE Moscow. Within the framework of DAMDID 2023 .
About the workshop:
Materials data collection and systematization that initially used traditional means of data publication then acquired a status of a discipline of its own and much accelerated with the advent of computers. The proliferation of Big Data in materials characterization added up to the requirement of having scalable and interoperable data infrastructures for materials research and innovation.
Extensive bibliographic and material properties databases laid a foundation for machine-assisted data harvesting, data analysis and data repurposing. Design of new materials with the predefined functional properties, and matching in silico models with experimental data have become a reality across the globe and inspired a few national initiatives in materials genome.
The progress of information technology has made it possible to not only use computers for data management and data analysis, but also made computers a viable tool for experimentation on par with physical and chemical experiments. Powerful software platforms and high quality simulated data are now prime citizens in many research and innovation settings.
The domains of materials data infrastructures, materials data analysis and materials in silico experiments have accumulated thriving communities that enjoy regular gatherings and have dedicated professional bodies for ongoing discussions. There is a lack of a common forum though with the prime purpose of multilateral and mutually beneficial discussions across all three communities.
The DACOMSIN workshop is going to address this communication gap and bring together professionals from across research and innovation to share their experience and perspectives of using information technology and computer science for materials data management, analysis and simulation.
Information technology and computer science that intrinsically underpin each of the three pillars of the workshop should be able to become a universal glue, too, that can get these areas closer to each other and can support seamless transition from materials research to pilot innovative applications and eventually to scalable industrial deployments.
Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!
To be used only for spelling or punctuation mistakes.