IEEE Access (Jan 2024)

Ricci Planner: Zero-Shot Transfer for Goal-Conditioned Reinforcement Learning via Geometric Flow

  • Wongeun Song,
  • Jungwoo Lee

DOI
https://doi.org/10.1109/ACCESS.2024.3361478
Journal volume & issue
Vol. 12
pp. 24027 – 24038

Abstract

Read online

The long-horizon problem has been a persistent challenge in the field of reinforcement learning, leading to the exploration of various solutions. Planning methods have emerged as a prominent approach, generating intermediate plans from the current state to the goal state. However, these methods often rely on the assumption of additional interaction with the environment or the availability of offline data. However, in real-world scenarios, instead of such information, only time-independent random observations of the environment can be provided. To overcome this limitation, we propose the Ricci planner, a novel algorithm capable of generating a plan from the current location to a desired goal using only a limited number of time-independent random samples from the observation space. We drew inspiration from the observation that the most efficient path is one with the minimum length and, based on this, we transformed the problem of finding an efficient path into a shortest path-finding problem. Subsequently, we formulated this as an optimization problem on a path space. However, the length functional in the path space is highly non-convex and multi-modal, which results in numerous suboptima, and makes the problem exceedingly challenging to solve. To address this issue, we employ the Ricci flow to continuously transform the target manifold into a simpler manifold. Initially, we identify the shortest path on the simpler manifold and subsequently convert it to the shortest path on the desired manifold by applying the inverse process of the Ricci flow. We conduct a experimental comparison with graph-based shortest path finding methods. We assessed both the quality of the generated plan itself and its effectiveness when applied to an agent and observed improved results from both perspectives compared to the baseline.

Keywords