Abdullah, Cep Ubad
Universitas Pendidikan Indonesia

Published : 1 Documents

Found 1 Documents

Question Generator System of Sentence Completion in TOEFL Using NLP and K-Nearest Neighbor Riza, Lala Septem; Pertiwi, Anita Dyah; Rahman, Eka Fitrajaya; Munir, Munir; Abdullah, Cep Ubad
Indonesian Journal of Science and Technology Vol 4, No 2 (2019): IJOST: VOLUME 4, ISSUE 2, 2019
Publisher : Universitas Pendidikan Indonesia

Show Abstract | Download Original | Original Source | Check in Google Scholar | DOI: 10.17509/ijost.v4i2.18202


Test of English as a Foreign Language (TOEFL) is one of learning evaluation forms that requires excellent quality of questions. Preparing TOEFL questions in conventional way certainly spends a lot of time. Computer technology can be used to solve the problem. Therefore, this research was conducted in order to solve the problem of making TOEFL questions with sentence completion type. The built system consists of several stages: (1) input data collection from foreign media news sites with excellent English grammar quality 2) preprocessing with Natural Language Processing (NLP) (3) Part of Speech (POS) tagging (4) question feature extraction (5) separation and selection of news sentences (6) determination and value collection of 7 features (7) conversion of categorical data value (8) target classification of blank position word with K-Nearest Neighbor (KNN) (9) heuristic determination of rules from human experts (10) options selection or distraction based on heuristic rules. After conducting the experiment on 10 news, it obtained that 20 questions based on the results of the evaluation showed that the generated questions have very good quality with percentage of 81.93% after the assessment by the human expert, and 70% the same blank position from the historical data of TOEFL questions. So, it can be concluded that the generated question has the following characteristics: the quality of the result follows the data training from the historical TOEFL questions, and the quality of the distraction is very good because it is derived from the heuristics of human experts.