Astried Silvanie
Institut Pertanian Bogor

Published : 1 Documents
Articles

Found 1 Documents
Search
Journal : TELKOMNIKA Telecommunication, Computing, Electronics and Control

Streamed Sampling on Dynamic data as Support for Classification Model Silvanie, Astried; Djatna, Taufik; Sukoco, Heru
TELKOMNIKA Telecommunication, Computing, Electronics and Control Vol 11, No 4: December 2013
Publisher : Universitas Ahmad Dahlan

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (51.295 KB)

Abstract

Data mining process on dynamically changing data have several problems, such as unknown data size and skew of the data is always changing. Random sampling method commonly applied for extracting general synopsis from very large database. In this research, Vitter’s reservoir algorithm is used to retrieve k records of data from the database and put into the sample. Sample is used as input for classification task in data mining. Sample type is backing sample and it saved as table contains value of id and priority. Priority indicates the probability of how long data retained in the sample. Kullback-Leibler divergence applied to measure the similarity between population and sample distribution. Result of this research is showed that continuously taken samples randomly is possible when transaction occurs. Kullback-Leibler divergence is a very good measure to maintain similar distribution between the population and the sample with interval from 0 to 0.0001. Sample results are always up to date on new transactions with similar skewnes. In purpose of classification task, decision tree model is improved significantly when the changing occurred.