A Technical Approach on Large Data Distributed Over a Network

Suhasini G; Mamtha Billa; Ashwini P

doi:10.12777/ijse.2.2.34-41

International Journal of Science and Engineering (Dec 2011)

A Technical Approach on Large Data Distributed Over a Network

Suhasini G,
Mamtha Billa,
Ashwini P

Affiliations

Suhasini G: 1Ganapathy College of Engineering, Warangal
Mamtha Billa: Ramappa Engineering college, Warangal
Ashwini P: Vaagdevi Engineering College,Warangal

DOI: https://doi.org/10.12777/ijse.2.2.34-41
Journal volume & issue: Vol. 2, no. 2
pp. 34 – 41

Abstract

Read online

Data mining is nontrivial extraction of implicit, previously unknown and potential useful information from the data. For a database with number of records and for a set of classes such that each record belongs to one of the given classes, the problem of classification is to decide the class to which the given record belongs. The classification problem is also to generate a model for each class from given data set. We are going to make use of supervised classification in which we have training dataset of record, and for each record the class to which it belongs is known. There are many approaches to supervised classification. Decision tree is attractive in data mining environment as they represent rules. Rules can readily expressed in natural languages and they can be even mapped o database access languages. Now a days classification based on decision trees is one of the important problems in data mining which has applications in many areas. Now a days database system have become highly distributed, and we are using many paradigms. we consider the problem of inducing decision trees in a large distributed network of highly distributed databases. The classification based on decision tree can be done on the existence of distributed databases in healthcare and in bioinformatics, human computer interaction and by the view that these databases are soon to contain large amounts of data, characterized by its high dimensionality. Current decision tree algorithms would require high communication bandwidth, memory, and they are less efficient and scalability reduces when executed on such large volume of data. So there are some approaches being developed to improve the scalability and even approaches to analyse the data distributed over a network.[keywords: Data mining, Decision tree, decision tree induction, distributed data, classification]

Published in International Journal of Science and Engineering

ISSN: 2086-5023 (Print); 2302-5743 (Online)
Publisher: Diponegoro University
Country of publisher: Indonesia
LCC subjects: Technology: Technology (General); Science: Science (General)
Website: http://www.ejournal.undip.ac.id/index.php/ijse/

About the journal