• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site


AI Research Centre

Creation of neural network models and datasets motivated by linguistic theory

Project completed

Project Relevance

Building up the parameters of pre-trained language models does not advance us to the task of Natural Language Understanding, because such linguistic models substitute understanding by manipulating word forms. The gap between a language model and a language system is formulated as follows: a language system consists of a relation between forms and their meanings. The training data used by language models are only forms without meanings.

Project goal

is to create training datasets with expert linguistic markup focusing on the most problematic areas of modern language models: discourse cohesion, differences in speech act types, deep syntactic structure providing variation in language expressions with shared semantics.

Advantages of the proposed solution:

  • Possibility to incorporate linguistic information into current neural network architectures

    The datasets being developed will contain linguistic information defining significant components of the communicative situation, narrative structure, language variation - information that is obvious to humans, but so far is practically not reproducible at the level of artificial modelling.

  • New fundamental and applied results in the field of automatic natural language processing

    The obtained neural network models can be used to improve conversational and generative chatbots, to automatically analyse complex narrative structures, to search for paraphrases and syntactic synonyms.

Significance of the project results:

Interest in the convergence of linguistic science and automatic natural language analysis, which has recently been most active in both the linguistic and NLP communities.

The emergence of next-level solutions bridging the gap between artificial and natural intelligence in the field of natural language.

The project was implemented jointly with a partner

Project team

Anastasiya A. Bonch-Osmolovskaya