Reinforcement learning (RL) is the most widely used machine learning algorithm, besides supervised and unsupervised learning and the less common self-supervised and semi-supervised learning. RL focuses on the controlled learning process, where a machine learning algorithm is provided with a set of actions, parameters, and end values. It teaches the machine trial and error.
From a data efficiency perspective, several methods have been proposed, including online setting, reply buffer, storing experience in a transition memory, etc. In recent years, off-policy actor-critic algorithms have been gaining prominence, where RL algorithms can learn from limited data sets entirely without interaction (offline RL).