Target tracking based on Siamese network has reached the state-of-the-art performance. However is still limited in semantic feature extraction. In this paper, we propose a novel method to distinguish positive and negative samples. Taking deep neural network as the backbone, we fuse the feature maps from different layers and feed it to RPN (Region Proposal Network). In addition, we use a loss term for loss function to achieve self-adjusting and learn more discriminative embedding features of target objects with similar semantics. In the tracking stage, one-shot detection is used as the reference, fix the first frame as the weight of tracking to track the subsequent frames. Our method has achieved outstanding performance on several benchmark data set, such as: OTB2015, VOT2016, VOT2018, and VOT2019 et al.