PDF

Abstract

Proposed a hierarchical reinforcement learning method that learns atomic actions via imitation learning and fin-tuned via reinforcement learning. Simplify long-horizon policy learning problem. How? Proposed a novel “data-relabeling algorithm” for learning a goal-conditioned hierachical policies.

No access to specific task, leverage unstructured & unsegmented demonstrations for imitation learning.

Why is it good?

Recent RL are constrained to relatively simple short-horizon skilss, HRL is used instead. However, HRL struggle with exploration(h-DQN), skill segmentation(option), and reward definition(Diversity is all you need). We simplify problem by utilizing extra supervision in the form of unstructured human demonstrations.

Core Technology

Verification Method

Any Argument or Idea?

Next Paper to read

Details