Семинар HDI&TFAIM Lab "Inverse Entropy-regularized RL"
In this talk we consider entropy regularised RL and aim to solve the inverse statistical problem of recovering the rewards from a sample from the expert based on the optimal policy. We propose an estimator and study its convergence.