IEEE Access (Jan 2019)
Improving Order Execution Cost Estimation by Semisupervised Learning
Abstract
Order execution cost analysis is one of the most important problems in financial investments. Many previous research works model the problem as a cost classification or regression task. However, due to insufficient real orders, performances of those models are not satisfying. Moreover, unlimited simulated orders generated by market simulators are not exploited by the analysis approach. In this paper, we propose an order execution cost estimation approach by using limited real orders and unlimited simulated orders. The approach 1) employs exploratory data analysis to explore the patterns and relationships included in the raw data, and selects the appropriate features for model training, 2) trains supervised models on labeled orders as baselines to estimate order execution cost, 3) trains three Semisupervised Learning (SSL) models on both labeled and simulated orders to improve the estimation performances, where a. Semisupervised Support Vector Machine (S3VM) makes a low-density separation on labeled and unlabeled orders, b. Tri-Training performs bootstrap sampling on the labeled orders to obtain three labeled training sets to make disagreement for labeling unlabeled orders, and c. Label Propagation (LP) model propagates the order execution cost labels of the labeled orders to the unlabeled ones on a graph and adjusts the labels based on local and global consistency. Experiments are conducted on real and simulated order datasets. Results of the experiments show that the SSL models perform better than the baselines, where S3VM optimized by Adam, Random Forest (RF) based Tri-Training and Radial Basic Function (RBF) based LP can make use of the information of unlabeled orders to tremendously improve classification performances in F1 score.
Keywords