This study investigates energy efficiency issues of device-to-device (D2D) communications in heterogeneous networks. To minimize the total transmitted power, an approach based on Q-learning together with adaptive ɛ-greedy is proposed to optimize the connection of user equipment (UE) with base station (BS) or access point (AP). The proposed adaptive ɛ-greedy can conduct the adequate exploration and exploitation operations for effective optimization. Simulation results indicate that in the single-cell scenario, the proposed method can attain performance close to the best solution.