PENGARUH TEXT PREPROCESSING DAN KOMBINASINYA PADA PERINGKAS DOKUMEN OTOMATIS TEKS BERBAHASA INDONESIA

  • Hadiyatun Najjichah
  • Abdul Syukur
  • Hendro Subagyo
Keywords: Automatic text Summarization, stemmin, Stopword removal

Abstract

Numbers of information increased in concordance to the growth of them digitally. And many of informations available on the internet are in textual form. It is necessary for the information seekers to get the what they require. Automatic Text Summarization is a process of summarizing done by machine through certain methods to get a shorter form of document while still preserving the gist. This research is to examine the influence of preprocessing text and its combination to the Automatic Text Summarization of Bahasa Indonesia. Method used are segmentation, Tokenization, Stopword removal, Stemming and N-Gram. There are 14 steps of combination. The results shows that indeed there is influence of those combinations to the Automatic Text Summarization.The highest F-Measure is resulted on combinations step of Tokenization >> 2-gram >> Summarization, with 66% accuracy. While the lowest is resulted from the combination step of Tokenization >> 3-gram >> Summarization and process of Tokenization >> 4-gram >> Summarization with 63% accuracy.

Published
2019-09-16