Articles

Found 15 Documents
Search

RANDOM FOREST APPROACH FOR SENTIMENT ANALYSIS IN INDONESIAN LANGUAGE Fauzi, M. Ali
Indonesian Journal of Electrical Engineering and Computer Science Vol 12, No 1: October 2018
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.11591/ijeecs.v12.i1.pp46-50

Abstract

Sentiment analysis become very useful since the rise of social media and online review website and, thus, the requirement of analyzing their sentiment in an effective and efficient way. We can consider sentiment analysis as text classification problem with sentiment as its categories. In this study, we explore the use of Random Forest for sentiment classification in Indonesian language. We also explore the use of bag of words (BOW) features with some term weighting methods variation such as Binary TF, Raw TF, Logarithmic TF and TF.IDF. The experiment result showed that sentiment analysis system using random forest give good performance with average OOB score 0.829. The result also depicted that all of the four term weighting method has competitive result. Since the score difference is not very significant, we can say that the term weighting method variation in study has no remarkable effect for sentiment analysis using Random Forest.
CYBERBULLYING IDENTIFICATION IN TWITTER USING SUPPORT VECTOR MACHINE AND INFORMATION GAIN BASED FEATURE SELECTION Dwi Purnamasari, Ni Made Gita; Fauzi, M. Ali; Indriati, Indriati; Dewi, Liana Shinta
Indonesian Journal of Electrical Engineering and Computer Science Vol 18, No 3: June 2020
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.11591/ijeecs.v18.i3.pp1494-1500

Abstract

Cyberbullying is one of the actions that violate the ITE Law where the crime is committed on social media applications such as Twitter. This action is difficult to detect if no one is reporting the tweet. Cyberbullying tweet identification aims to classify tweets that contain bullying. Classification is done using Support Vector Machine method where this method aims to find the dividing hyperplane between negative and positive class. This study is a text classification where more data is used, the more features are produced, therefore this research also uses Information Gain as feature selection to select features that are not relevant to the classification. The process of the system starts from text preprocessing with tokenizing, filtering, stemming and term weighting. Then perform the information gain feature selection by calculating the entropy value of each term. After that perform the classification process based on the terms that have been selected, and the output of the system is identification whether the tweet is bullying or not. The result of using SVM method is accuracy 75%, precision 70.27%, recall 86.66% and f-measure 77.61% on experiment maximum iteration = 20, ? = 0.5, ? = 0.001, ? = 0.000001, and C = 1. The best threshold of information gain is 90%, with accuracy 76.66%, precision 72.22%, recall 86.66% and f-measure 78.78%.
WORD2VEC MODEL FOR SENTIMENT ANALYSIS OF PRODUCT REVIEWS IN INDONESIAN LANGUAGE Fauzi, M. Ali
International Journal of Electrical and Computer Engineering (IJECE) Vol 9, No 1: February 2019
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (786.495 KB) | DOI: 10.11591/ijece.v9i1.pp525-530

Abstract

Online product reviews have become a source of greatly valuable information for consumers in making purchase decisions and producers to improve their product and marketing strategies. However, it becomes more and more difficult for people to understand and evaluate what the general opinion about a particular product in manual way since the number of reviews available increases. Hence, the automatic way is preferred. One of the most popular techniques is using machine learning approach such as Support Vector Machine (SVM). In this study, we explore the use of Word2Vec model as features in the SVM based sentiment analysis of product reviews in Indonesian language. The experiment result show that SVM can performs well on the sentiment classification task using any model used. However, the Word2vec model has the lowest accuracy (only 0.70), compared to other baseline method including Bag of Words model using Binary TF, Raw TF, and TF.IDF. This is because only small dataset used to train the Word2Vec model. Word2Vec need large examples to learn the word representation and place similar words into closer position.
INDONESIAN NEWS CLASSIFICATION USING NAïVE BAYES AND TWO-PHASE FEATURE SELECTION MODEL Fauzi, M. Ali; Arifin, Agus Zainal; Gosaria, Sonny Christiano
Indonesian Journal of Electrical Engineering and Computer Science Vol 8, No 3: December 2017
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.11591/ijeecs.v8.i3.pp610-615

Abstract

Since the rise of WWW, information available online is growing rapidly. One of the example is Indonesian online news. Therefore, automatic text classification became very important task for information filtering. One of the major issue in text classification is its high dimensionality of feature space. Most of the features are irrelevant, noisy, and redundant, which may decline the accuracy of the system. Hence, feature selection is needed. Maximal Marginal Relevance for Feature Selection (MMR-FS) has been proven to be a good feature selection for text with many redundant features, but it has high computational complexity. In this paper, we propose a two-phased feature selection method. In the first phase, to lower the complexity of MMR-FS we utilize Information Gain first to reduce features. This reduced feature will be selected using MMR-FS in the second phase. The experiment result showed that our new method can reach the best accuracy by 86%. This new method could lower the complexity of MMR-FS but still retain its accuracy.
ARABIC BOOK RETRIEVAL USING CLASS AND BOOK INDEX BASED TERM WEIGHTING Fauzi, M. Ali; Arifin, Agus Zainal; Yuniarti, Anny
International Journal of Electrical and Computer Engineering (IJECE) Vol 7, No 6: December 2017
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (421.466 KB) | DOI: 10.11591/ijece.v7i6.pp3705-3710

Abstract

One of the most common issue in information retrieval is documents ranking. Documents ranking system collects search terms from the user and orderly retrieves documents based on the relevance. Vector space models based on TF.IDF term weighting is the most common method for this topic. In this study, we are concerned with the study of automatic retrieval of Islamic Fiqh (Law) book collection. This collection contains many books, each of which has tens to hundreds of pages. Each page of the book is treated as a document that will be ranked based on the user query. We developed class-based indexing method called inverse class frequency (ICF) and book-based indexing method inverse book frequency (IBF) for this Arabic information retrieval. Those method then been incorporated with the previous method so that it becomes TF.IDF.ICF.IBF. The term weighting method also used for feature selection due to high dimensionality of the feature space. This novel method was tested using a dataset from 13 Arabic Fiqh e-books. The experimental results showed that the proposed method have the highest precision, recall, and F-Measure than the other three methods at variations of feature selection. The best performance of this method was obtained when using best 1000 features by precision value of 76%, recall value of 74%, and F-Measure value of 75%.
TERM WEIGHTING BERBASIS INDEKS BUKU DAN KELAS UNTUK PERANGKINGAN DOKUMEN BERBAHASA ARAB Fauzi, M. Ali; Arifin, Agus; Yuniarti, Anny
Lontar Komputer Vol. 5, No. 2 Agustus 2014
Publisher : Research institutions and Community Service, University of Udayana

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (271.832 KB)

Abstract

Information Retrieval berdasarkan query tertentu sudah jamak ditemukan pada sistem komputer saat ini. Salah satu metode yang populer digunakan adalah perangkingan dokumen menggunakan space vector model berbasis pada nilai term weighting TF.IDF. Pada penelitian ini, terdapat beberapa buku berbahasa Arab yang memiliki puluhan bahkan ratusan halaman. Masing-masing halaman dari buku tersebut adalah sebuah dokumen yang akan diranking berdasarkan query dari pengguna. TF.IDF hanya melakukan pembobotan berbasis pada dokumen tanpa memperhatikan indeks buku dan kelas yang merupakan induk dokumen tersebut sehingga kinerjanya kurang maksimal jika diimplementasikan pada kasus ini. Oleh karena itu, diusulkan metode baru term weighting yang berbasis pada indeks buku dan kelas. Metode ini memperhatikan frekuensi kemunculan term pada keseluruhan buku dan kelas. Metode yang disebut inverse class frequency (ICF) dan inverse book frequency (IBF) ini digabungkan dengan metode sebelumnya sehingga menjadi TF.IDF.ICF.IBF. Pengujian metode ini menggunakan dataset dari beberapa e-book berbahasa arab. Hasil penelitian menunjukkan bahwa metode yang diajukan terbukti dapat diaplikasikan pada perangkingan dokumen berbahasa arab dan memiliki performa yang lebih bagus dibanding metode sebelumnya dengan nilai F-Measure 75%, precision 76%, dan recall mencapai 74%.
Peramalan Produksi Gula Menggunakan Metode Jaringan Syaraf Tiruan Backpropagation Pada PG Candi Baru Sidoarjo Rachman, Adi Sukarno; Cholissodin, Imam; Fauzi, M. Ali
Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer Vol 2 No 4 (2018)
Publisher : Fakultas Ilmu Komputer (FILKOM), Universitas Brawijaya

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (857.213 KB)

Abstract

Gula adalah bahan pokok yang rutin digunakan oleh masyarakat Indonesia. Gula sering digunakan pada industri makanan dan minuman, industri pengolahan dan pengawetan makanan. Kebutuhan gula meningkat didukung oleh gaya hidup masyarakat Indonesia terutama dalam kehidupan sehari-hari. PG Candi Baru adalah pabrik gula yang dibangun pada tahun 1832 dan merupakan perusahaan penghasil gula SHS I (Superior Hooft Suiker) atau Gula Kristal Putih I (GKP). Semenjak tahun 2004 PG Candi Baru meningkatkan kinerja perusahaan dilakukan besar-besaran dan melakukan perubahan melalui terobosan teknologi di bidang on farm dan off farm. Penelitian ini menggunakan Jaringan Syaraf Tiruan Backpropagation dengan rancangan arsitektur jaringan berupa 4 neuron input layer, 3 hidden layer, dan 1 output layer. Berdasarkan pada pengujian jumlah iterasi maksimum didapatkan nilai MAPE terendah sebesar 17,85% dengan jumlah iterasi 800. Dan pada pengujian learning rate didapatkan nilai MAPE terendah sebesar 17,38% dengan nilai learning rate 0,4. Jika dengan nilai iterasi maksimum 800 dan nilai learning rate 0,4 maka akan menghasilkan nilai MAPE sebesar 16.98%.
TWITTER SENTIMENT ANALYSIS ON 2013 CURRICULUM USING ENSEMBLE FEATURES AND K-NEAREST NEIGHBOR Irfan, M. Rizzo; Fauzi, M. Ali; Tibyani, Tibyani; Mentari, Nurul Dyah
International Journal of Electrical and Computer Engineering (IJECE) Vol 8, No 6: December 2018
Publisher : Institute of Advanced Engineering and Science

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (563.823 KB) | DOI: 10.11591/ijece.v8i6.pp5409-5414

Abstract

2013 curriculum is a new curriculum in the Indonesian education system which has been enacted by the government to replace KTSP curriculum. The implementation of this curriculum in the last few years has sparked various opinions among students, teachers, and public in general, especially on social media twitter. In this study, a sentimental analysis on 2013 curriculum is conducted. Ensemble of several feature sets were used twitter specific features, textual features, Parts of Speech (POS) features, lexicon based features, and Bag of Words (BOW) features for the sentiment classification using K-Nearest Neighbor method. The experiment result showed that the the ensemble features have the best performance of sentiment classification compared to only using individual features. The best accuracy using ensemble features is 96% when k=5 is used.
Sistem Temu Kembali Informasi Pasal-Pasal KUHP Menggunakan Metode Cosine Similarity dan Pembobotan Inverse Book Frequency Sabilal, Billy; Fauzi, M. Ali; Indriati, Indriati
Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer Vol 3 No 4 (2019)
Publisher : Fakultas Ilmu Komputer (FILKOM), Universitas Brawijaya

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (724.044 KB)

Abstract

Di Indonesia hukum yang berlaku sesuai dengan Kitab Undang-undang Hukum Pidana (KUHP) yang merupakan hal yang harus dipatuhi sebagai warga negara yang baik atau yang berhubungan dibidangnya, seperti kepolisian, hakim atau orang-orang yang terkait dengan persidangan. Kitab Undang-undang Hukum Pidana merupakan kitab yang tebal, terdapat 569 pasal didalam kitab tersebut tersebut akan menjadi sangat tidak efisien dan praktis apabila harus dibawa dan juga apabila ingin mencari pasal terkait yang harus membuka halaman satu-persatu secara manual. Berdasarkan kondisi tersebut dalam penelitian ini dikembangkan aplikasi menggunakan metode cosine similarity dan pembobotan inverse book frequency. Metode cosine similarity digunakan untuk menghitung kesamaan atau kedekatan dokumen pasal dengan query. Pembobotan inverse book frequency pembobotan term yang mempertimbangkan pendistribusian pada koleksi buku. Nilai setiap istilah diasumsikan memiliki proporsi yang berlawanan dengan jumlah buku yang mengandung istilah tersebut. Kinerja dari sistem ditunjukkan dengan hasil pengujian pada masing-masing variasi 10 query dengan pembagian 3 query 1 kata, 3 query 2 kata, 3 query 3 kata dan 1 query 4 kata yang diujikan, dengan kinerja nilai rata-rata precision 0.5273, recall 1, f.measure 0.6063 sedangkan hasil rata-rata precision@k terbaik pada peringkat ke tiga sebesar 0.6498.
Prediksi Rating Pada Review Produk Kecantikan Menggunakan Metode Semantic Orientation Calculator dan Regresi Linier Sapuhtra, Bastian Dolly; Fauzi, M. Ali; Rahayudi, Bayu
Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer Vol 3 No 5 (2019): J-PTIIK
Publisher : Fakultas Ilmu Komputer (FILKOM), Universitas Brawijaya

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (694.06 KB)

Abstract

Ramainya produsen produk kecantikan menghasilkan produk yang bagus dan beragam. Hal ini menarik konsumen untuk menggunakan produk-produk kecantikan tersebut. Semakin banyak konsumen yang menggunakan produk kecantikan tersebut, membuat produsen mencoba berbagai inovasi pada produk mereka. Inovasi dapat diperoleh dari banyaknya komentar, saran, atau review yang dibuat oleh konsumen pada berbagai macam produk. Manfaat dari review produk bagi konsumen juga berguna untuk memperoleh informasi sebelum membeli suatu produk. Banyak hasil review yang ada tidak disertai dengan rating. Hal ini membuat produsen sulit dalam mengelompokkan review pada sentiment tertentu. Penelitian ini bertujuan untuk pengelompokan review kedalam sentiment tertentu secara otomatis dalam bentuk rating. Pada penelitian ini dibangun sebuah sistem menggunakan metode penghitungan Semantic Orientation Calulator dan Regresi Linier. Pemecahan kalimat pada review kedalam bentuk n-gram (bigram dan trigram) dan satu kalimat bertujuan meningkatkan hasil prediksi. Hasil Pengujian pada sistem ini adalah 23%, 71%, 67% pada akurasi bigram, 24%, 71%, 67% pada akurasi trigram, dan paling rendah 24%, 67%, 64% pada akurasi satu kalimat dengan menggunakan model pengujian toleransi 0, toleransi 1, dan sentiment review. Hasil pengujian terbaik pada pemecahan kalimat menggunakan n-gram (bigram dan trigram) cukup baik menyelesaikan masalah pada penelitian.