Онлайн-семинар «Deep Active Learning: Reducing Annotation Effort for Automatic Sequence Tagging of Clinical and Biomedical Texts»
13 мая 2020 прошел пятый онлайн-семинар Научно-учебной лаборатории моделей и методов вычислительной прагматики
Active learning is a technique that helps to minimize the annotation budget required for the creation of a labeled dataset while maximizing the performance of a model trained on this dataset. It has been shown that active learning can be successfully applied to sequence tagging tasks of text processing in conjunction with deep learning models even when a limited amount of labeled data is available. Recent advances in transfer learning methods for natural language processing based on deep pre-trained models such as ELMo and BERT offer a much better ability to generalize on small annotated datasets compared to their shallow counterparts. The combination of deep pre-trained models and active learning leads to a powerful approach to dealing with annotation scarcity. In this report, we will present recent experimental results of deep active learning on clinical and biomedical data in English and Russian. We will consider SOTA sequence tagging models in combination with several active learning strategies. Among NER and other sequence labeling tasks, we will discuss application of active learning in the task of finding heart risk factors in EHRs, which is a part of a biomedical research project on automated ischemic stroke prediction.
Презентация:13_05_artem_shelmanov_active_learning_NLP (PDF, 3.34 Мб)