Riyani, Ade
Indonesia Association of Computational Linguistics (INACL)

Published : 1 Documents

Found 1 Documents

Penerapan Cosine Similarity dan Pembobotan TF-IDF untuk Mendeteksi Kemiripan Dokumen Naf'an, Muhammad Zidny; Burhanuddin, Auliya; Riyani, Ade
Jurnal Linguistik Komputasional (JLK) Vol 2 No 1 (2019): Vol. 2, No. 1
Publisher : Indonesia Association of Computational Linguistics (INACL)

Show Abstract | Download Original | Original Source | Check in Google Scholar | Full PDF (476.917 KB) | DOI: 10.26418/jlk.v2i1.17


Plagiarism is the act of taking part or all of one's ideas in the form of documents or texts without including sources of information retrieval. This study aims to detect the similarity of text documents using the cosine similarity algorithm and weighting TF-IDF so that it can be used to determine the value of plagiarism. The document used for comparison of this text is an abstract of Indonesian. The results of the study, namely when stemming the similarity value is higher on average 10% than the stemming process is not done. This study produces a similarity value above 50% for documents with a high degree of similarity. Whereas documents with low similarity levels or no plagiarism produce similarity values ​​below 40%. With the method used in the preprocessing consisting of folding cases, tokenizing, removeal stopwords, and stemming. After the preprocessing process, the next step is to calculate the weighting of TF-IDF and the similarity value using cosine similarity so that it gets a percentage similarity value. Based on the experimental results of the cosine similarity algorithm and weighting TF-IDF, it can produce similarity values ​​from each comparative document