News Category Classification using Support Vector Machine Algorithm

Nisa Eka Juliana; Faridah Dewi Khansa; Aaz M Hafidz Azis; Rafli Indra Gunawan; Nurul Dwi Cahya

Authors

Nisa Eka Juliana Teknik Informatika, UIN Sunan Gunung Djati Bandung, Indonesia
Faridah Dewi Khansa Teknik Informatika, UIN Sunan Gunung Djati Bandung, Indonesia
Aaz M Hafidz Azis Teknik Informatika, UIN Sunan Gunung Djati Bandung, Indonesia
Rafli Indra Gunawan Teknik Informatika, UIN Sunan Gunung Djati Bandung, Indonesia
Nurul Dwi Cahya Teknik Informatika, UIN Sunan Gunung Djati Bandung, Indonesia

Keywords:

data mining, support vector machine, news classification

Abstract

Nowadays many have used web-based systems to convey information and news in real time. However, in dividing news into these categories, some are still done manually, so it takes a long time. Of the several existing techniques, the technique most often used for classification of news content is the Support Vector Machine (SVM). In complex problems or problems with many parameters, this method is very good to use. The SVM algorithm performs supervised learning classifications or has inputs and outputs that have been formed into a mathematical relationship model that can classify and predict existing data. There are 2224 datasets and 5 categories with 70% of the data being trained and 30% of the data being tested. This study produces text classifications in the form of technology, business, sports, entertainment, and political categories from digital news content. The classification results obtained an accuracy value of 98.35% with an average precision of 90%, a recall of 98%, an F1-score of 98% and a Support of 668.

Downloads

Download data is not yet available.

References

D. A. Pisner and D. M. Schnyer, â€œSupport vector machine,â€ in Machine Learning: Methods and Applications to Brain Disorders, 2019, pp. 101â€“121.

A. Mahinovs, A. Tiwari, R. Roy, and D. Baxter, Text classification method review. 2007.

F. Sebastiani, â€œMachine Learning in Automated Text Categorization,â€ ACM Computing Surveys, vol. 34, no. 1. pp. 1â€“47, 2002, doi: 10.1145/505282.505283.

S. W. Lin, K. C. Ying, S. C. Chen, and Z. J. Lee, â€œParticle swarm optimization for parameter determination and feature selection of support vector machines,â€ Expert Syst. Appl., vol. 35, no. 4, pp. 1817â€“1824, 2008, doi: 10.1016/j.eswa.2007.08.088.

O. Maimon and L. Rokach, Soft computing for knowledge discovery and data mining. 2008.

G. Tsoumakas, I. Katakis, and I. Vlahavas, Data Mining and Knowledge Discovery Handbook Second Edition. 2010.

F. Gorunescu, Data Mining: Concepts, models and techniques. Springer, 2011.

F. Gorunescu, â€œData Mining Techniques and Models,â€ in Data Mining, 2011, pp. 185â€“317.

M. W. Berry and J. Kogan, Text Mining: Applications and Theory. 2010.

G. Krishnalal, S. B. Rengarajan, and K. G. Srinivasagan, â€œA New Text Mining Approach Based on HMM-SVM for Web News Classification,â€ Int. J. Comput. Appl., vol. 1, no. 19, pp. 103â€“109, 2010, doi: 10.5120/395-589.

A. Nurhadi, â€œKlasifikasi Konten Berita Digital Bahasa Indonesia Menggunakan Support Vector Machines (SVM) Berbasis Particle Swarm Optimization (PSO),â€ J. Bianglala Inform., vol. 3, no. 2, pp. 1â€“9, 2015.

S. Alsaleem, â€œAutomated Arabic Text Categorization Using SVM and NB.,â€ Int. Arab. J. e Technol., vol. 2, no. 2, pp. 124â€“128, 2011.

Jumadi, D. S. Maylawati, B. Subaeki, and T. Ridwan, â€œOpinion mining on Twitter microblogging using Support Vector Machine: Public opinion about State Islamic University of Bandung,â€ in Proceedings of 2016 4th International Conference on Cyber and IT Service Management, CITSM 2016, 2016, doi: 10.1109/CITSM.2016.7577569.

H. Jiawei, M. Kamber, J. Han, M. Kamber, and J. Pei, Data Mining: Concepts and Techniques. 2006.