• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site
ФКН
Contacts

109028, Moscow,
11, Pokrovsky boulevard

Phone: +7 (495) 531-00-00 *27254

Email: computerscience@hse.ru

 

Administration
First Deputy Dean Tamara Voznesenskaya
Deputy Dean for Research and International Relations Sergei Obiedkov
Deputy Dean for Methodical and Educational Work Ilya Samonenko
Deputy Dean for Development, Finance and Administration Irina Plisetskaya
Article
A randomized coordinate descent method with volume sampling

Rodomanov A., Kropotov D.

SIAM Journal on Optimization. 2020. Vol. 30. No. 3. P. 1878-1904.

Article
ML-assisted versatile approach to Calorimeter R&D

A. Boldyrev, D. Derkach, F. Ratnikov et al.

Journal of Instrumentation. 2020. Vol. 15. P. 1-7.

Article
An accelerated directional derivative method for smooth stochastic convex optimization

Dvurechensky P., Eduard Gorbunov, Gasnikov A.

European Journal of Operational Research. 2021. Vol. 290. No. 2. P. 601-621.

Book chapter
On pattern setups and pattern multistructures

Kuznetsov S., Kaytoue M., Belfodil A.

In bk.: International Journal of General Systems. Iss. 49. 2020. P. 271-285.

Book chapter
Finite Time Analysis of Linear Two-timescale Stochastic Approximation with Markovian Noise

Kaledin M., Moulines E., Naumov A. et al.

In bk.: Proceedings of Machine Learning Research. Vol. 125: Proceedings of Thirty Third Conference on Learning Theory. 2020. P. 2144-2203.

Colloquium "Positional Embedding in Transformer-based Models"

Event ended

September 28, 18:10

Speaker: Tatiana Likhomanenko (Apple)

Title: Positional Embedding in Transformer-based Models

Abstarct:

Transformers have been shown to be highly effective on problems involving sequential modeling, such as in machine translation (MT) and natural language processing (NLP). Following its success on these tasks, the Transformer architecture raised immediate interest in other domains: automatic speech recognition (ASR), music generation, object detection, and finally image recognition and video understanding. Two major components of the Transformer are the attention mechanism and the positional encoding. Without the latter, vanilla attention Transformers are invariant with respect to input tokens permutations (making "cat eats fish" and "fish eats cat" identical to the model). In this talk we will discuss different approaches on how to encode positional information, their pros and cons: absolute and relative, fixed and learnable, 1D and multidimensional, additive and multiplicative, continuous and augmented positional embeddings. We will also focus on how well different positional embeddings generalize to unseen positions for both interpolation and extrapolation tasks.

zoom