Comparison of Classification Algorithms for Sentiment Analysis on Movie Comments
Main Article Content
Abstract
The film industry is growing rapidly nowadays, various genres and storylines are nicely packaged to convey messages and entertain audiences. Sentiment analysis technology can be used for the advancement of the film industry as well as film recommendations that need to be presented next. This study aims to compare several algorithms used for sentiment analysis of movie reviews or comments. The algorithms used in this study are K-Nearest Neighbor (k-NN), Naïve Bayes Classifier (NBC), and Logistic Regression. The experimental results using 25,000 film comment datasets show that Logistic Regression has the highest accuracy rate with an accuracy of 89%, compared to Naïve Bayes' accuracy of 86%, while k-NN is 65.22%.
Downloads
Article Details
This work is licensed under a Creative Commons Attribution 4.0 International License.
References
C. Nanda, M. Dua, and G. Nanda, “Sentiment Analysis of Movie Reviews in Hindi Language Using Machine Learning,†Proc. 2018 IEEE Int. Conf. Commun. Signal Process. ICCSP 2018, pp. 1069–1072, 2018, doi: 10.1109/ICCSP.2018.8524223.
N. S. Fathullah, Y. A. Sari, and P. P. Adikara, “Analisis Sentimen Terhadap Rating dan Ulasan Film dengan menggunakan,†vol. 4, no. 2, pp. 590–593, 2020.
D. M. E. D. M. Hussein, “A survey on sentiment analysis challenges,†J. King Saud Univ. - Eng. Sci., 2018, doi: 10.1016/j.jksues.2016.04.002.
J. Ipmawati, Kusrini, and E. Taufiq Luthfi, “Komparasi Teknik Klasifikasi Teks Mining Pada Analisis Sentimen,†Indones. J. Netw. Secur., vol. 6, no. 1, pp. 28–36, 2017.
D. G. Nugroho, Y. H. Chrisnanto, and A. Wahana, “Analisis Sentimen Pada Jasa Ojek Online ... (Nugroho dkk.),†pp. 156–161, 2015.
L. Wu and M. Li, “Predict the Customer Churn Problem,†2018 5th Int. Conf. Ind. Econ. Syst. Ind. Secur. Eng., pp. 1–5, 2018.
B. Gunawan, H. S. Pratiwi, and E. E. Pratama, “Sistem Analisis Sentimen pada Ulasan Produk Menggunakan Metode Naive Bayes,†J. Edukasi dan Penelit. Inform., vol. 4, no. 2, p. 113, 2018, doi: 10.26418/jp.v4i2.27526.
N. Ruhyana, “Analisis Sentimen Terhadap Penerapan Sistem Plat Nomor Ganjil / Genap Pada Twitter Dengan Metode Klasifikasi Naive Bayes,†J. IKRA-ITH Inform., vol. 3, no. 1, pp. 94–99, 2019.
A. Saleh, “Implementasi Metode Klasifikasi Naïve Bayes Dalam Memprediksi Besarnya Penggunaan Listrik Rumah Tangga,†vol. 2, no. 3, pp. 207–217, 2015.
M. S. Mustafa, M. R. Ramadhan, and A. P. Thenata, “Implementasi Data Mining untuk Evaluasi Kinerja Akademik Mahasiswa Menggunakan Algoritma Naive Bayes Classifier,†Creat. Inf. Technol. J., vol. 4, no. 2, p. 151, 2018, doi: 10.24076/citec.2017v4i2.106.
P. A. Putri, Ridok, and Indriati, “Implementasi Metode Improved K-Nearest Neighbor pada Analisis Sentimen Twitter Berbahasa Indonesia,†Repos. J. Mhs. PTIIK UB, vol. 2, pp. 1–8, 2013.
M. G. Pradana, A. C. Nurcahyo, and P. H. Saputro, “PENGARUH SENTIMEN DI SOSIAL MEDIA DENGAN HARGA SAHAM PERUSAHAAN,†Edutic - Sci. J. Informatics Educ., vol. 6, no. 2, 2020, doi: 10.21107/edutic.v6i2.6992.
M. Habibi, “Analisis Sentimen dan Klasifikasi Komentar Mahasiswa pada Sistem Evaluasi Pembelajaran Menggunakan Kombinasi KNN Berbasis Cosine Similarity dan Supervised Model,†Dep. Ilmu Komput. dan Elektron. Fak. Mat. dan Ilmu Pengetah. Alam, 2017.
P. T. Ahp, “Analisis Kepribadian Melalui Twitter Menggunakan Metode Logistic Regression dengan,†Anal. Kepribadian Melalui Twitter Menggunakan Metod. Logist. Regres. dengan, vol. 6, no. 2, pp. 9667–9682, 2019.
R. Lakshmi, R. B. Divya Satya, and R. Valarmathi, “Analysis of sentiment in twitter using logistic regression,†Int. J. Eng. Technol., vol. 7, no. 2.33 Special Issue 33, pp. 619–621, 2018, doi: 10.14419/ijet.v7i2.33.14849.
Q. Cheng, P. K. Varshney, and M. K. Arora, “Logistic regression for feature selection and soft classification of remote sensing data,†IEEE Geosci. Remote Sens. Lett., vol. 3, no. 4, pp. 491–494, 2006, doi: 10.1109/LGRS.2006.877949.
Z. Wang and X. Sun, “Document classification algorithm based on kernel logistic regression,†2010 2nd Int. Conf. Ind. Inf. Syst. IIS 2010, vol. 1, pp. 76–79, 2010, doi: 10.1109/INDUSIS.2010.5565909.
S. L. David W. Hosmer, “Breaking the ‘Invisible-profession’ paradigm,†Journal of Environmental Health, vol. 70, no. 3. 2007.
A. Prabhat and V. Khullar, “Sentiment classification on big data using Naïve bayes and logistic regression,†2017 Int. Conf. Comput. Commun. Informatics, ICCCI 2017, 2017, doi: 10.1109/ICCCI.2017.8117734.
S. H. Adil, M. Ebrahim, K. Raza, S. S. Azhar Ali, and M. Ahmed Hashmani, “Liver Patient Classification using Logistic Regression,†2018 4th Int. Conf. Comput. Inf. Sci. Revolutionising Digit. Landsc. Sustain. Smart Soc. ICCOINS 2018 - Proc., pp. 1–5, 2018, doi: 10.1109/ICCOINS.2018.8510581.
A. Z. Amrullah, A. Sofyan Anas, and M. A. J. Hidayat, “Analisis Sentimen Movie Review Menggunakan Naive Bayes Classifier Dengan Seleksi Fitur Chi Square,†Jurnal, vol. 2, no. 1, pp. 40–44, 2020, doi: 10.30812/bite.v2i1.804.
Haniah Mahmudah, Okkie Puspitorini, Nur Adi Siswandari, Ari Wijayanti, and Eliya Alfatekha, “Metode Naive Bayes Classifier – Smoothing pada Sensor Smartphone untuk Klasifikasi Aktivitas Pengendara,†J. Nas. Tek. Elektro dan Teknol. Inf., vol. 9, no. 3, pp. 268–277, 2020, doi: 10.22146/.v9i3.382.
E. Fix and J. L. Hodges, “Discriminatory Analysis. Nonparametric Discrimination: Consistency Properties,†Int. Stat. Rev. / Rev. Int. Stat., vol. 57, no. 3, p. 238, 1989, doi: 10.2307/1403797.
N. S. Altman, “An introduction to kernel and nearest-neighbor nonparametric regression,†Am. Stat., vol. 46, no. 3, pp. 175–185, 1992, doi: 10.1080/00031305.1992.10475879.
S. M. Piryonesi and T. E. El-Diraby, “Role of Data Analytics in Infrastructure Asset Management: Overcoming Data Size and Quality Problems,†J. Transp. Eng. Part B Pavements, vol. 146, no. 2, p. 04020022, 2020, doi: 10.1061/jpeodx.0000175.
T. Hastie, R. Tibshirani, J. H. Friedman, and MyiLibrary., “The elements of statistical learning data mining, inference, and prediction : with 200 full-color illustrations,†Springer Ser. Stat., p. xvi, 533 p., 2001, [Online]. Available: http://www.myilibrary.com?id=18743.
G. Abdillah et al., “PENERAPAN DATA MINING PEMAKAIAN AIR PELANGGAN UNTUK MENENTUKAN KLASIFIKASI POTENSI PEMAKAIAN AIR PELANGGAN BARU DI PDAM TIRTA RAHARJA MENGGUNAKAN ALGORITMA K-MEANS,†SENTIKA 2016, vol. 2016, no. Sentika, pp. 18–19, 2016.