Significant progress on human and vehicle pose estimation has been achieved in recent years. The performance of these methods has evolved from poor to remarkable in just a couple of years. This improvement has been obtained from increasingly complex architectures. In this paper, we explore the applicability of simple baseline methods by adding a few deconvolutional layers on a backbone network to estimate heat maps that correspond to the vehicle keypoints. This approach has been proven to be very effective for human pose estimation. The results are analyzed on the PASCAL3D+ dataset, achieving state-of-the-art results. In addition, a set of experiments has been conducted to study current shortcomings in vehicle keypoints labelling, which adversely affect performance. A new strategy for defining vehicle keypoints is presented and validated with our customized dataset with extended keypoints.