Articles

SISTEM TEMU KEMBALI DOKUMEN TEKS DENGAN PEMBOBOTAN TF-IDF DAN LCS Saadah, Munjiah Nur; Atmagi, Rigga Widar; Rahayu, Dyah S.; Arifin, Agus Zainal
JUTI: Jurnal Ilmiah Teknologi Informasi Vol 11, No 1, Januari 2013
Publisher : Informatics, ITS

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (612.866 KB) | DOI: 10.12962/j24068535.v11i1.a16

Abstract

Sistem temu kembali dokumen teks membutuhkan metode yang mampu mengembalikan sejumlah dokumen yang memiliki relevansi tinggi sesuai dengan permintaan pengguna. Salah satu tahapan penting dalam proses representasi teks adalah proses pembobotan. Penggunaan LCS dalam penyesuaian bobot Tf -Idf mempertimbangkan kemunculan urutan kata yang sama antara query dan teks di dalam dokumen. Adanya dokumen yang sangat panjang namun tidak relevan menyebabkan bobot yang dihasilkan tidak mampu merepresentasikan nilai relevansi dokumen. Penelitian ini mengusulkan penggunaan metode LCS yang memberikan bobot urutan kata dengan mempertimbangkan panjang dokumen terkait dengan rata-rata panjang dokumen dalam korpus. Metode ini mampu melakukan pengembalian dokumen teks secara efektif. Penambahan fitur urutan kata dengan normalisasi rasio panjang dokumen terhadap keseluruhan dokumen dalam korpus menghasilkan nilai presisi dan recall yang sama baiknya dengan metode sebelumnya.
COVERAGE, DIVERSITY, AND COHERENCE OPTIMIZATION FOR MULTI-DOCUMENT SUMMARIZATION Umam, Khoirul; Putro, Fidi Wincoko; Pratamasunu, Gulpi Qorik Oktagalu; Arifin, Agus Zainal; Purwitasari, Diana
Jurnal Ilmu Komputer dan Informasi Vol 8, No 1 (2015): Jurnal Ilmu Komputer dan Informasi (Journal of Computer Science and Information)
Publisher : Faculty of Computer Science - Universitas Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (608.144 KB) | DOI: 10.21609/jiki.v8i1.278

Abstract

A great summarization on multi-document with similar topics can help users to get useful in for ma tion. A good summary must have an extensive coverage, minimum redundancy (high diversity), and smooth connection among sentences (high coherence). Therefore, multi-document summarization that con siders the coverage, diversity, and coherence of summary is needed. In this paper we pro pose a novel method on multi-document summarization that optimizes the coverage, diversity, and co her ence among the summary's sentences simultaneously. It integrates self-adaptive differential evo lu tion (SaDE) al gorithm to solve the optimization problem. Sentences ordering algorithm based on top ic al closeness ap proach is performed in SaDE iterations to improve coherences among the summary's sen tences. Ex pe ri ments have been performed on Text Analysis Conference (TAC) 2008 data sets. The ex perimental re sults showed that the proposed method generates summaries with average coherence and ROUGE scores 29-41.2 times and 46.97-64.71% better than any other method that only consider coverage and di versity, re-spect ive ly.
USER EMOTION IDENTIFICATION IN TWITTER USING SPECIFIC FEATURES: HASHTAG, EMOJI, EMOTICON, AND ADJECTIVE TERM Sari, Yuita Arum; Ratnasari, Evy Kamilah; Mutrofin, Siti; Arifin, Agus Zainal
Jurnal Ilmu Komputer dan Informasi Vol 7, No 1 (2014): Jurnal Ilmu Komputer dan Informasi (Journal of Computer Science and Information)
Publisher : Faculty of Computer Science - Universitas Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (271.587 KB) | DOI: 10.21609/jiki.v7i1.252

Abstract

Abstract Twitter is a social media application, which can give a sign for identifying user emotion. Identification of user emotion can be utilized in commercial domain, health, politic, and security problems. The problem of emotion identification in twit is the unstructured short text messages which lead the difficulty to figure out main features. In this paper, we propose a new framework for identifying the tendency of user emotions using specific features, i.e. hashtag, emoji, emoticon, and adjective term. Preprocessing is applied in the first phase, and then user emotions are identified by means of classification method using kNN. The proposed method can achieve good results, near ground truth, with accuracy of 92%.
PENDEKATAN POSITIONAL TEXT GRAPH UNTUK PEMILIHAN KALIMAT REPRESENTATIF CLUSTER PADA PERINGKASAN MULTI-DOKUMEN Suputra, I Putu Gede Hendra; Arifin, Agus Zainal; Yuniarti, Anny
Jurnal Ilmu Komputer Vol 6 No 2: September 2013
Publisher : Informatics Department, Faculty of Mathematics and Natural Sciences, Udayana University

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (643.965 KB)

Abstract

Coverage and saliency are major problems in Automatic Text Summarization. Sentence clusteringapproaches are methods able to provide good coverage on all topics, but the point to be considered is theselection of important sentence that can represent the cluster’s topic. The salient sentences selected asconstituent to the final summary should have information density so that can convey important informationcontained in the cluster. Information density from the sentence can be mined by extracting the sentenceinformation density (SID) feature that built from positional text graph approach of every sentence in the cluster.This paper proposed a cluster representative sentence selection strategy that used the positional text graphapproach in multi-document summarization. There are three concepts that used in this paper: (1) sentenceclustering based on similarity based histogram clustering, (2) cluster ordering based on cluster importance and(3) representative sentence selection based on sentence information density feature score. The candidatesummary sentence is a sentence that has greatest sentence information density feature score of a cluster. Trialsconducted on task 2 DUC 2004 dataset. ROUGE-1 measurement was used as performance metric to comparethe use of SID feature with other method namely Local Importance and Global Importance (LIGI). Test resultshowed that the use of SID feature was successfully outperform LIGI method based on ROUGE-1 values wherethe greatest average value of ROUGE-1 that achieved by SID features is 0.3915.
Perangkingan Dokumen Berbahasa Arab Menggunakan Latent Semantic Indexing Wahib, Aminul; Pasnur, Pasnur; Santika, Putu Praba; Arifin, Agus Zainal
Jurnal Buana Informatika Vol 6, No 2 (2015): Jurnal Buana Informatika Volume 6 Nomor 2 April 2015
Publisher : Universitas Atma Jaya Yogyakarta

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (417.199 KB) | DOI: 10.24002/jbi.v6i2.411

Abstract

Berbagai metode perangkingan dokumen dalam aplikasi InformationRetrieval telah dikembangkan dan diimplementasikan. Salah satu metode yangsangat populer adalah perangkingan dokumen menggunakan vector space modelberbasis pada nilai term weighting TF.IDF. Metode tersebut hanya melakukanpembobotan term berdasarkan frekuensi kemunculannya pada dokumen tanpamemperhatikan hubungan semantik antar term. Dalam kenyataannya hubungansemantik antar term memiliki peranan penting untuk meningkatkan relevansi hasilpencarian dokumen. Penelitian ini mengembangkan metode TF.IDF.ICF.IBFdengan menambahkan Latent Semantic Indexing untuk menemukan hubungansemantik antar term pada kasus perangkingan dokumen berbahasa Arab. Datasetyang digunakan diambil dari kumpulan dokumen pada perangkat lunak MaktabahSyamilah. Hasil pengujian menunjukkan bahwa metode yang diusulkanmemberikan nilai evaluasi yang lebih baik dibandingkan dengan metodeTF.IDF.ICF.IBF. Secara berurut nilai f-measure metode TF.IDF.ICF.IBF.LSIpada ambang cosine similarity 0,3, 0,4, dan 0,5 adalah 45%, 51%, dan 60%. Namun metode yang disulkan memiliki waktu komputasi rata-rata lebih tinggidibandingkan dengan metode TF.IDF.ICF.IBF sebesar 2 menit 8 detik.
TERM WEIGHTING BASED ON INDEX OF GENRE FOR WEB PAGE GENRE CLASSIFICATION Sugiyanto, Sugiyanto; Rozi, Nanang Fakhrur; Putri, Tesa Eranti; Arifin, Agus Zainal
JUTI: Jurnal Ilmiah Teknologi Informasi Vol 12, No 1, Januari 2014
Publisher : Informatics, ITS

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (402.086 KB) | DOI: 10.12962/j24068535.v12i1.a43

Abstract

Automating the identification of the genre of web pages becomes an important area in web pages classification, as it can be used to improve the quality of the web search result and to reduce search time. To index the terms used in classification, generally the selected type of weighting is the document-based TF-IDF. However, this method does not consider genre, whereas web page documents have a type of categorization called genre. With the existence of genre, the term appearing often in a genre should be more significant in document indexing compared to the term appearing frequently in many genres despites its high TF-IDF value. We proposed a new weighting method for web page documents indexing called inverse genre frequency (IGF). This method is based on genre, a manual categorization done semantically from previous research. Experimental results show that the term weighting based on index of genre (TF-IGF) performed better compared to term weighting based on index of document (TF-IDF), with the highest value of accuracy, precision, recall, and F-measure in case of excluding the genre-specific keywords were 78%, 80.2%, 78%, and 77.4% respectively, and in case of including the genre-specific keywords were 78.9%, 78.7%, 78.9%, and 78.1% respectively.
INFORMATION RETRIEVAL OF TEXT DOCUMENT WITH WEIGHTING TF-IDF AND LCS Saadah, Munjiah Nur; Atmagi, Rigga Widar; Rahayu, Dyah S.; Arifin, Agus Zainal
Jurnal Ilmu Komputer dan Informasi Vol 6, No 1 (2013): Jurnal Ilmu Komputer dan Informasi (Journal of Computer Science and Information)
Publisher : Faculty of Computer Science - Universitas Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (545.523 KB) | DOI: 10.21609/jiki.v6i1.216

Abstract

Information retrieval of text document requires a method that is able to restore a number of documents that have high relevance according to the user's request. One important step in the process is a text representation of the weighting process. The use of LCS in Tf-Idf weighting adjustments considers the appearance of the same order of words between the query and the text in the document. There is a very long document but irrelevant cause weight produced is not able to represent the value relevance of documents. This research proposes the use of LCS which gives weight to the word order by considering long documents related to the average length of documents in the corpus. This method is able to return a text document effectively. Additional features of word order by normalizing the ratio of the overall length of the document to the documents in the corpus generate values of precision and recall as well as the method of Tasi et al.
AUTOMATIC DETERMINATION OF SEEDS FOR RANDOM WALKER BY SEEDED WATERSHED TRANSFORM FOR TUNA IMAGE SEGMENTATION Abdullah, Moch Zawaruddin; Qomariah, Dinial Utami Nurul; Farosanti, Lafnidita; Arifin, Agus Zainal
Jurnal Ilmu Komputer dan Informasi Vol 11, No 1 (2018): Jurnal Ilmu Komputer dan Informasi (Journal of Computer Science and Information
Publisher : Faculty of Computer Science - Universitas Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (744.742 KB) | DOI: 10.21609/jiki.v11i1.468

Abstract

Tuna fish image classification is an important part to sort out the type and quality of the tuna based upon the shape. The image of tuna should have good segmentation results before entering the classification stage. It has uneven lighting and complex texture resulting in inappropriate segmentation. This research proposed method of automatic determination seeded random walker in the watershed region for tuna image segmentation. Random walker is a noise-resistant segmentation method that requires two types of seeds defined by the user, the seed pixels for background and seed pixels for the object. We evaluated the proposed method on 30 images of tuna using relative foreground area error (RAE), misclassification error (ME), and modified Hausdroff distances (MHD) evaluation methods with values of 4.38%, 1.34% and 1.11%, respectively. This suggests that the seeded random walker method is more effective than exiting methods for tuna image segmentation.
KOMPRESI CITRA PENGINDERAAN JAUH MULTISPEKTRAL BERBASIS CLUSTERING DAN REDUKSI SPEKTRAL Arifin, Agus Zainal; Lestriandoko, Nova Hadi
JUTI: Jurnal Ilmiah Teknologi Informasi Vol 2, No 1 Januari 2003
Publisher : Informatics, ITS

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (278.187 KB) | DOI: 10.12962/j24068535.v2i1.a110

Abstract

Kompresi Citra pada aplikasi Penginderaan Jauh Multispektral merupakan kebutuhan yang sangat vital, sebab citra multispektral merupakan citra yang membutuhkan ruang penyimpanan yang sangat besar. Di sisi lain, citra multispektral memiliki karakteristik istimewa yang dapat dimanfaatkan untuk meningkatkan efektifitas proses kompresinya. Karakteristik ini berkaitan dengan penggunaan citra tersebut dalam proses klasifikasinya. Dengan demikian, kompresi citra multispektral tidak perlu menggunakan cara konvensional, melainkan dengan memanfaatkan karakteristik yang dimilikinya. Penelitian ini membahas sebuah metode kompresi citra multispektral yang mengintegrasikan metode clustering, manipulasi spektral, serta pengkodean. Metode clustering yang digunakan adalah Improved Split and Merge Clustering (ISMC). Pada proses manipulasi spektral digunakan Principal Component Analysis (PCA). Sedangkan untuk pengkodean digunakan metode kompresi data lossless yaitu metode Lempel-Ziv Welch (LZW). Integrasi dari clustering, manipulasi spektral, dan pengkodean ini dibagi menjadi 2 kombinasi, yakni clustering-LZW dan PCA-clustering-LZW. Evaluasi dilakukan dengan mengukur rasio kompresi, waktu komputasi, jumlah cluster, dan ukuran kesalahan yang meliputi total error, maksimum error, dan rata-rata error. Dari hasil uji coba, didapatkan bahwa masing-masing metode ini memiliki keunggulan yang berbeda pada tiap faktor evaluasinya, sehingga pengguna dapat memilih untuk menggunakan metode yang tepat sesuai kebutuhannya. Kata kunci : Clustering, kompresi citra, citra multispektral, ISMC, reduksi spektral, LZW
PENGGUNAAN ANALISA FAKTOR UNTUK KLASIFIKASI CITRA PENGINDERAAN JAUH MULTISPEKTRAL Arifin, Agus Zainal; Kurniati, Wiwik Dyah Septiana
JUTI: Jurnal Ilmiah Teknologi Informasi Vol 1, No 1 Juli 2002
Publisher : Informatics, ITS

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (421.672 KB) | DOI: 10.12962/j24068535.v1i1.a91

Abstract

Proses clustering bisa berlangsung baik secara hierarchical (split dan merge) maupun partitional (partisi). Proses split yang pembagiannya berdasarkan histogram lebih mudah dilakukan pada satu dimensi, sehingga dibutuhkan proses transformasi. Metode transformasi yang umum digunakan adalah Principal Component Analysis (PCA). Namun PCA ternyata hanya didasarkan pada pencarian dimensi bervariansi maksimum, sehingga memungkinkan terjadinya overlapping kelas, dalam arti ada kelas yang tidak dapat dipisahkan Pada penelitian ini, metode transformasi yang digunakan adalah Analisa Faktor (Factor Analysis / Canonical Analysis). Metode ini lebih baik bila dibandingkan dengan metode Principal Component Analysis (PCA). Sebab, Analisa Faktor mentransformasi sekaligus memilah cluster dalam feature space. Tiga proses utama dalam penelitian ini yaitu split, merge, dan partitional K-means clustering. Citra multispektral ditransformasi menjadi satu dimensi. Histogram satu dimensi displit dengan pemilihan puncak kurva. Merge menggabungkan cluster hasil split tersebut. Cluster yang berdekatan digabungkan menjadi cluster baru. K-means clustering digunakan untuk mendeteksi lokasi pusat cluster (prototipe cluster) dan sekaligus mengelompokkan pixel ke setiap cluster. Hasil penelitian ini dibandingkan dengan hasil algoritma clustering yang proses transformasinya menggunakan PCA. Hasil perbandingan membuktikan bahwa clustering yang proses transformasinya menggunakan Analisa Faktor menghasilkan heterogenitas antar cluster lebih tinggi (Tr(SB) meningkat antara 0.83 % sampai 19.58 %). Adapun kekompakan tiap cluster tidak selalu optimal. Hal ini sangat mungkin disebabkan jumlah kelas sampel kurang banyak dan pengambilan sampel di tiap kelas kurang bervariasi. Kata kunci: Analisa Faktor, complete link, K-means clustering, Scatter within class, Scatter between class
Co-Authors - Azhari - Suprijanto Adam, Safri Adi Guna, I Gusti Agung Socrates afrizal laksita akbar, afrizal laksita Ahmad Reza Musthafa, Ahmad Reza Ahmad Wahyu Rosyadi, Ahmad Wahyu Akbar, Mohammad Sonhaji Akira Asano, Akira Akira Taguchi, Akira Al Haromainy, Muhammad Muharrom Alhaji Sheku Sankoh, Alhaji Sheku ali fauzi Alif Akbar Fitrawan, Alif Akbar Aminul Wahib Andi Baso Kaswar, Andi Baso Anggraini, Syadza Anny Yunairti, Anny Anny Yuniarti Anto Satriyo Nugroho, Anto Satriyo Arif Fadllullah, Arif Arifin, M. Jainal Arifin, M. Jainal Arifzan Razak, Arifzan Arisa, Nursanti Novi Arrie Kurniawardhani Arya Yudhi Wijaya Aryo Harto, Aryo Bintana, Rizqa Raaiqa Bisono, Eva Firdayanti Chandranegara, Didih Rizki Christian Sri kusuma Aditya, Christian Sri kusuma Danardono, Renest Daniel Swanjaya Darlis Herumurti Darmawan, Tio Dasrit Debora Kamudi, Dasrit Debora Diana Purwitasari Dika Rizky Yunianto, Dika Rizky Dimas Fanny H. P., Dimas Fanny Dini Adni Navastara, Dini Adni Dwi Ari Suryaningrum, Dwi Ari Dwi P., Galang Amanda Dyah S. Rahayu Edwadr, Gregorius Effendi, Ari Ekoputris, Rizqi Okta Endang Juliastuti Erliyah Nurul Jannah, Erliyah Nurul Evy Kamilah Ratnasari Fajrin, Ahmad Miftah Farosanti, Lafnidita Fathoni, Kholid Fathoni, Kholid Fauzi, M Ali Fidi Wincoko Putro, Fidi Wincoko Go Frendi Gunawan Gosaria, Sonny Christiano Gosario, Sony Gulpi Qorik Oktagalu Pratamasunu, Gulpi Qorik Oktagalu Gus Nanang Syaifuddiin, Gus Nanang Hadi, Ahmad Mustofa Hani’ah, Mamluatul Hani`ah, Mamluatul Hudan Studiawan Humaira, Fitrah Maharani Humaira, Fitrah Maharani Husain, Nursuci Putri I Made Widiartha I Putu Gede Hendra Suputra Indra Lukmana Indraswari, Rarasmaya Jayanti Yusmah Sari, Jayanti Yusmah Kartika, Dhian Khadijah F. Hayati Khadijah Fahmi Hayati Holle, Khadijah Fahmi Hayati Khairiyyah Nur Aisyah, Khairiyyah Nur Khalid Khalid, Khalid Khoirul Umam Laili Cahyani Lailly S. Qolby, Lailly S. Lukman Hakim Lutfiani Ratna Dewi, Lutfiani Ratna M. Ali Fauzi, M. Ali Manek, Siprianus Septian Maryamah Maryamah Maulana, Hendra Meidyani, Biandina Moch Zawaruddin Abdullah, Moch Zawaruddin Moh. Zikky, Moh. Mohammad Fatoni Anggris, Mohammad Fatoni Muhamad Nasir, Muhamad Muhammad Imron Rosadi Munjiah Nur Saadah Musa, Saiful Bahri Musa, Saiful Bahri Mutmainnah Muchtar, Mutmainnah Muttaqi, Muhammad Mirza Nanang Fakhrur Rozi Naser Jawas, Naser Naufal, Ahmad Afiif Ni'mah, Ana Tsalitsatun Ni'mah, Ana Tsalitsatun Nova Hadi Lestriandoko Nur, Nahya Nurilham, Adhi Parwita, Mika Pasnur Pasnur Prayitno Rozi, Ismail Eko Priyatno, Arif Mudi Puji Budi Setia Asih, Puji Budi Purnomo, Adenuar Puspaningrum, Alifia Puspaningrum, Alifia Putra, Fatra Nonggala Putriwijaya, Novi Nur Putu Praba Santika Qolby, Lailly Qomariah, Dinial Utami Nurul Rahayu, Putri Nur Randy Cahya Wihandika, Randy Cahya Ratri Enggar Pawening, Ratri Enggar Resti Ludviani Rigga Widar Atmagi Rintyarna, Bagus Setya Riyanarto Sarno Riza, Ozzy Secio Rizka Wakhidatus Sholikah, Rizka Wakhidatus Ryfial Azhar, Ryfial Saprina Mamase, Saprina Setyawan, Dimas Ari Sholikah, Rizka Siti Mutrofin Socrates, I Guna Adi Subali, Made Agus Putra Sugiyanto Sugiyanto Syuhada, Fahmi Syuhada, Fahmi Syuhada Takashi Nakamoto, Takashi Tanuwijaya, Evan Tegar Palyus Fiqar Tesa Eranti Putri Tuwohingide, Desmin Umi Salamah Verdianto, Satrio Waluya, Onny Kartika Wanvy Arifha Saputra, Wanvy Arifha Wardhana, Septiyawan R. Wattiheluw, Fadli Husein Wawan Gunawan Wijayanti Nurul Khotimah Wiwik Dyah Septiana Kurniati Yudha, Ery Permana Yudhi Diputra Yufis Azhar Yuita Arum Sari Yunianto, Dika R. Zainal Abidin Zarkasi, Mohammad