Workshop "Understanding How LLMs Work: An Introduction to Interpretability", AIRI

We will cover the foundations of neural network interpretation, focusing on Sparse Autoencoders - a simple but powerful concept that has brought us closer to understanding the inner workings of large language models. We'll explore why self-attention works, viewed through the lens of information flow, what circuits are and how models leverage them to solve problems, and how these insights are advancing our understanding of LLM architecture.

Workshop "Understanding How LLMs Work: An Introduction to Interpretability", AIRI

Alexey Dontsov AIRI

Elena Tutubalina AIRI

Alexey Dontsov
AIRI

Elena Tutubalina
AIRI