A Technical Approach on Large Data Distributed Over a Network

International Journal of Science and Engineering. 2011;2(2):34-41 DOI 10.12777/ijse.2.2.34-41

 

Journal Homepage

Journal Title: International Journal of Science and Engineering

ISSN: 2086-5023 (Print); 2302-5743 (Online)

Publisher: Diponegoro University

Society/Institution: Department of Chemical Engineering, Diponegoro University

LCC Subject Category: Technology: Technology (General) | Science: Science (General)

Country of publisher: Indonesia

Language of fulltext: English

Full-text formats available: PDF

 

AUTHORS

Suhasini G (1Ganapathy College of Engineering, Warangal)
Mamtha Billa (Ramappa Engineering college, Warangal)
Ashwini P (Vaagdevi Engineering College,Warangal)

EDITORIAL INFORMATION

Peer review

Editorial Board

Instructions for authors

Time From Submission to Publication: 8 weeks

 

Abstract | Full Text

<p>Data mining is nontrivial extraction of implicit, previously unknown and potential useful information from the data. For a database with number of records and for a set of classes such that each record belongs to one of the given classes, the problem of classification is to decide the class to which the given record belongs. The classification problem is also to generate a model for each class from given data set. We are going to make use of supervised classification in which we have training dataset of record, and for each record the class to which it belongs is known. There are many approaches to supervised classification. Decision tree is attractive in data mining environment as they represent rules. Rules can readily expressed in natural languages and they can be even mapped o database access languages. Now a days classification based on decision trees is one of the important problems in data mining&nbsp;&nbsp; which has applications in many areas.&nbsp; Now a days database system have become highly distributed, and we are using many paradigms. we consider the problem of inducing decision trees in a large distributed network of highly distributed databases. The classification based on decision tree can be done on the existence of distributed databases in healthcare and in bioinformatics, human computer interaction and by the view that these databases are soon to contain large amounts of data, characterized by its high dimensionality. Current decision tree algorithms would require high communication bandwidth, memory, and they are less efficient and scalability reduces when executed on such large volume of data. So there are some approaches being developed to improve the scalability and even approaches to analyse the data distributed over a network.[keywords: Data mining, Decision tree, decision tree induction, distributed data, classification]</p>