Dianxin kexue (Dec 2016)
Application of random forest in big data completion
Abstract
Telecom operators have a lot of data, but in view of a variety of reasons, the quality of the data is not ideal, there are a lot of data is not complete or even missing. For existing data mining, it is necessary to carry out the data to meet the quality of the data and to achieve sufficient sampling proportion. Relying on the country's existing log retention system, template library design data integrity, authentication could not meet the quality requirements of the data, using the random forest algorithm, the same data with or related data was found, data was completed and data quality was improved, and the template library was extended by optimization of feedback. The construction of completion data subsystem in the system log retained end-to-end data quality guaranteed and improved quality, completed and improved the real-time data and historical data, and ultimately met the requirements of data processing and mining operators, improved data quality and value.