Lailan Sahrina Hasibuan, Lailan Sahrina
Department of Computer Science, Bogor Agricultural University

Published : 3 Documents

Found 3 Documents

Improving DNA Barcode-based Fish Identification System on Imbalanced Data using SMOTE Kusuma, Wisnu Ananta; Noviana, Nurdevi; Hasibuan, Lailan Sahrina; Nurilmala, Mala
TELKOMNIKA (Telecommunication Computing Electronics and Control) Vol 15, No 3: September 2017
Publisher : Universitas Ahmad Dahlan

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (457.977 KB) | DOI: 10.12928/telkomnika.v15i3.5011


Problem in imbalanced data is very common in classification or identification. The problem is raised when the number of instances of one class far exceeds the other. In the previous research, our DNA barcode-based Identification System of Tuna and Mackerel was developed in imbalanced dataset. The number of samples of Tuna and Mackerel were much more than those of other fish samples. Therefore, the accuracy of the classification model was probably still in bias. This research aimed at employing Synthetic Minority Oversampling Technique (SMOTE) to yield balanced dataset. We used k-mers frequencies from DNA barcode sequences as features and Support Vector Machine (SVM) as classification method. In this research we used trinucleotide (3-mers) and tetranucleotide (4-mers). The training dataset was taken from Barcode of Life Database (BOLD). For evaluating the model, we compared the accuracy of model using SMOTE and without SMOTE in order to classify DNA barcode sequences which is taken from Department of Aquatic Product Technology, Bogor Agricultural University. The results showed that the accuracy of the model in the species level using SMOTE was 7% and 13% higher than those of non-SMOTE for trinucleotide (3-mers) and tetranucleotide (4-mers), respectively. It is expected that the use of SMOTE, as one of data balancing technique, could increase the accuracy of DNA barcode based fish classification system, particularly in the species level which is difficult to be identified.
Evaluation of F-Measure and Feature Analysis of C5.0 Implementation on Single Nucleotide Polymorphism Calling Hasibuan, Lailan Sahrina; Nabila, Sita; Hudachair, Nurul; Istiadi, Muhammad Abrar
Indonesian Journal of Artificial Intelligence and Data Mining Vol 1, No 1 (2018): March
Publisher : UIN Sultan Syarif Kasim Riau

Show Abstract | Download Original | Original Source | Check in Google Scholar


Data growing in molecular biology has increased rapidly since Next-Generation Sequencing (NGS) technology introduced in 2000, the latest technology used to sequence DNA with high throughput. Single Nucleotide Polymorphism (SNP) is a marker based on DNA which can be used to identify organism specifically. SNPs are usually exploited for optimizing parents selection in producing high-quality seed for plant breeding. This paper discusses SNP calling underlying NGS data of cultivated soybean (Glycine max [L]. Merr) using C5.0, an improved rule-based algorithm of C4.5. The evaluation illustrated that C5.0 is better than the other rule-based algorithm CART based on f-measure. The value of f-measure using C5.0 and CART are 0.63 and 0.58. Besides of that, C5.0 is robust for imbalanced training dataset up to 1:17 but it is suffer in large training dataset. C5.0’s performance may be increased by applying bagging or the other ensemble technique as improvement of CART by applying bagging in final decision. The other important thing is using appropriate features in representing SNP candidates. Based on information gain of C5.0, this paper recommends error probability, homopolymer left, mismatch alt and mean nearby qual as features for SNP calling.
Model Spasial untuk Prediksi Konsentrasi Polutan Kabut Asap Kebakaran Lahan Gambut Menggunakan Support Vector Regression Agmalaro, Muhammad Asyhar; Sitanggang, Imas Sukaesih; Hasibuan, Lailan Sahrina; Ramadhan, Muhammad Murtadha
Jurnal Ilmu Komputer dan Agri-Informatika Vol 5, No 2 (2018)
Publisher : Departemen Ilmu Komputer IPB

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.29244/jika.5.2.119-127


Kabut asap dari kebakaran lahan gambut mengandung berbagai macam polutan seperti CO dan CO2. Polutan tersebut dapat berimplikasi buruk pada kesehatan masyarakat sekitar peristiwa itu terjadi yang berupa Infeksi Saluran Pernafasan Atas (ISPA). Penelitian ini bertujuan untuk membuat model spasial untuk prediksi konsentrasi polutan kabut asap yang berupa CO dan CO2 dari kebakaran lahan gambut di Sumatra tahun 2015. Model spasial dibentuk menggunakan algoritme support vector regression (SVR) dengan kernel radial basis function (RBF) dengan melihat konsentrasi polutan dari beberapa titik tetangga. Parameter tuning dilakukan untuk mendapatkan nilai parameter paling optimal dari SVR. Hasil penelitian menunjukkan bahwa model spasial prediksi konsentrasi CO terbaik didapatkan pada gamma dengan nilai 20 yang menghasilkan root mean squared error (RMSE) dan nilai koefisien korelasi sebesar 1,174242×10-8 dan 0,5879287. Model spasial prediksi konsentrasi CO2 terbaik dibentuk pada gamma dengan nilai 10 yang  menghasilkan RMSE dan nilai koefisien korelasi sebesar 9,843717×10-8 dan 0,6058418. Hasil prediksi dari model yang dibentuk telah dapat mengikuti pola nilai aktual konsentrasi polutan.Kata Kunci: CO, CO2, kabut asap, model spasial, support vector regression.