Machine Learning with Applications (Jun 2025)
Bitcoin price direction prediction using on-chain data and feature selection
Abstract
Bitcoin is the most traded cryptocurrency by volume and market cap. A number of scholars have directed their research towards characterizing Bitcoin’s speculative behavior using a myriad of techniques such as technical analysis, price regression, and direction classification. For this work, research is conducted using the relatively nascent technique of on-chain data analysis. The goal of this research is to evaluate Bitcoin’s on-chain data in predicting future price direction. First, a classification process of on-chain data features that helps the reader understand their relevance is proposed. To address the curse of dimensionality, feature selection algorithms such as L1 regression, Boruta, and the dimensionality reduction algorithm Principal Component Analysis (PCA) are utilized. The research then explores advanced neural networks for next day price direction prediction, including the Convolutional Neural Network-Long-Short Term Memory (CNN-LSTM) and the Temporal Convolutional Network (TCN). Neural network models and trading strategies are then compared based on their return statistics. A comparative analysis of feature selection, learning model performance, and trading strategy performance is also conducted. Results from the research show that the Boruta feature selection algorithm combined with the CNN-LSTM model performs best compared to other combinations with a prediction accuracy of 82.03 % over the testing period. In addition, the on-chain features within the category, realized value, and unrealized value classifications have higher predictive powers for next day price direction prediction. Finally, during trade simulations, the CNN-LSTM model with a Long-Short strategy had an annualized return of 1682.7 % and a Sharpe Ratio of 6.47.