Our Lab held the RuREBus shared task
More information about the corpus, types of relations and entities can be found in the repository of competition, which we held at the Dialog 2020 conference on RuREBus data.
Named entity recognition (NER) is a well-studied task, with a plenty of annotated data, on which SOTA models show high quality. At the same time, achieving the same good results in business cases often is difficult: documents and entities are domain-specific, text is written with clerical language (e.g., business documents), or, conversely, contains colloquial language (for example, dialogs in chat bots). In addition, it may be useful to extract not only entities, but also relations between them, and for this task there is less annotated data.
We present RuREBus (Russian Relation Extraction for Business) corpus – strategic planning documents of the Ministry of Economic Development of the Russian Federation with annotated entities and relationships. More information about the corpus, types of relations and entities can be found in the repository of competition, which we held at the Dialog 2020 conference on RuREBus data.
We also carried out a research on the obtained corpus, the results are presented in our article, “So, what is the plan? Mining Strategic Planning Documents” for the Digital Transformation and Global Society conference (DTGS 2020):
For more details see the paper:
Ivanin, Vitaly and Artemova, Ekaterina and Batura, Tatiana and Ivanov, Vladimir and Sarkisyan, Veronika and Tutubalina, Elena and Smurov, Ivan “RuREBus-2020 Shared Task: Russian Relation Extraction for Business”, Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialog” [Komp’iuternaia Lingvistika i Intellektual’nye Tehnologii: Trudy Mezhdunarodnoj Konferentsii “Dialog”, 2020, Moscow, Russia