In urban warfare and rescue scenarios, the localization of soldiers in staircase environments presents substantial challenges. Addressing this issue, this study introduces the “Target Localization in Staircase Environments-Proximal Policy Optimization” module, which combines object detection with reinforcement learning algorithms. Utilizing a bespoke dataset, “Soldier-Staircase for Tracked Robots”, and design standards tailored for swing-arm tracked robots, the module meets the requirements for autonomous precise localization in staircase environments. A series of experiments conducted in simulated environments verify the module’s efficacy in autonomously and accurately localizing soldier targets in staircase environments, laying the groundwork for further research into the application of reinforcement learning in autonomous robot control.