Лаборатория теоретических основ моделей искусственного интеллекта

Лаборатория теоретических основ моделей искусственного интеллекта выполняет исследования, а также осуществляет прикладные разработки в наиболее востребованных и перспективных направлениях искусственного интеллекта. Сотрудники лаборатории регулярно публикуют статьи в престижных научных изданиях и трудах ведущих международных конференций, а также имеют опыт сотрудничества с крупными IT-компаниями.

Семинар HDI&TFAIM Lab: Randomised Scalable Cross-Entropy Loss for Extreme Classification Problem and Sequential Recommender Systems in particular

Мероприятие завершено
В этот четверг, 12 декабря, в 14:40, выступит с докладом Глеб Мезенцев (Сколтех, AIRI) .

Scalability issue plays a crucial role in productionizing modern recommender systems. Even lightweight architectures may suffer from high computational overload due to intermediate calculations, limiting their practicality in real-world applications. Specifically, applying full Cross-Entropy (CE) loss often yields state-of-the-art performance in terms of recommendations quality. Still, it suffers from excessive GPU memory utilization when dealing with large item catalogs.
We introduce a novel Scalable Cross-Entropy (SCE) loss function in the sequential learning setup. It approximates the CE loss for datasets with large-size catalogs, enhancing both time efficiency and memory usage without compromising recommendations quality. Unlike traditional negative sampling methods, our approach utilizes a selective GPU-efficient computation strategy, focusing on the most informative elements of the catalog, particularly those most likely to be false positives. This is achieved by approximating the softmax distribution over a subset of the model outputs through the GPU-efficient approximate maximum inner product search. Experimental results on multiple datasets demonstrate the effectiveness of SCE in reducing peak memory usage by a factor of up to 100 compared to the alternatives, retaining or even exceeding their metrics values. The proposed approach also opens new perspectives for large-scale developments in different domains, such as large language models.

Аудитории D109, по всем вопросам обращайтесь к Алямовской Е.Г. ealyamovskaya@hse.ru или Зеленовой К.М. kzelenova@hse.ru