TELKOMNIKA Telecommunication, Computing, Electronics and Control
Vol 11, No 4: December 2013

Streamed Sampling on Dynamic data as Support for Classification Model

Silvanie, Astried ( Bogor Agricultural University )
Djatna, Taufik ( Department of Agro-Industrial Technology, Bogor Agricultural University. )
Sukoco, Heru ( Department of Computer Science, Bogor Agricultural University. )



Article Info

Publish Date
01 Dec 2013

Abstract

Data mining process on dynamically changing data have several problems, such as unknown data size and skew of the data is always changing. Random sampling method commonly applied for extracting general synopsis from very large database. In this research, Vitter’s reservoir algorithm is used to retrieve k records of data from the database and put into the sample. Sample is used as input for classification task in data mining. Sample type is backing sample and it saved as table contains value of id and priority. Priority indicates the probability of how long data retained in the sample. Kullback-Leibler divergence applied to measure the similarity between population and sample distribution. Result of this research is showed that continuously taken samples randomly is possible when transaction occurs. Kullback-Leibler divergence is a very good measure to maintain similar distribution between the population and the sample with interval from 0 to 0.0001. Sample results are always up to date on new transactions with similar skewnes. In purpose of classification task, decision tree model is improved significantly when the changing occurred.

Copyrights © 2013