Comparison of Classification Algorithms for Sentiment Analysis on Movie Comments

Dian Sa'adillah Maylawati; Melani Nur Mudyawati; Muhammad Humam Wahisyam; Riki Ahmad Maulana

Authors

Dian Sa'adillah Maylawati Teknik Informatika, UIN Sunan Gunung Djati Bandung, Indonesia
Melani Nur Mudyawati Teknik Informatika, UIN Sunan Gunung Djati Bandung, Indonesia
Muhammad Humam Wahisyam Teknik Informatika, UIN Sunan Gunung Djati Bandung, Indonesia
Riki Ahmad Maulana Teknik Informatika, UIN Sunan Gunung Djati Bandung, Indonesia

Keywords:

data mining, k-nearest neighbor, naive bayes classifier, logistic regression, classification, sentiment analysis

Abstract

The film industry is growing rapidly nowadays, various genres and storylines are nicely packaged to convey messages and entertain audiences. Sentiment analysis technology can be used for the advancement of the film industry as well as film recommendations that need to be presented next. This study aims to compare several algorithms used for sentiment analysis of movie reviews or comments. The algorithms used in this study are K-Nearest Neighbor (k-NN), NaÃ¯ve Bayes Classifier (NBC), and Logistic Regression. The experimental results using 25,000 film comment datasets show that Logistic Regression has the highest accuracy rate with an accuracy of 89%, compared to NaÃ¯ve Bayes' accuracy of 86%, while k-NN is 65.22%.

Downloads

Download data is not yet available.

References

C. Nanda, M. Dua, and G. Nanda, â€œSentiment Analysis of Movie Reviews in Hindi Language Using Machine Learning,â€ Proc. 2018 IEEE Int. Conf. Commun. Signal Process. ICCSP 2018, pp. 1069â€“1072, 2018, doi: 10.1109/ICCSP.2018.8524223.

N. S. Fathullah, Y. A. Sari, and P. P. Adikara, â€œAnalisis Sentimen Terhadap Rating dan Ulasan Film dengan menggunakan,â€ vol. 4, no. 2, pp. 590â€“593, 2020.

D. M. E. D. M. Hussein, â€œA survey on sentiment analysis challenges,â€ J. King Saud Univ. - Eng. Sci., 2018, doi: 10.1016/j.jksues.2016.04.002.

J. Ipmawati, Kusrini, and E. Taufiq Luthfi, â€œKomparasi Teknik Klasifikasi Teks Mining Pada Analisis Sentimen,â€ Indones. J. Netw. Secur., vol. 6, no. 1, pp. 28â€“36, 2017.

D. G. Nugroho, Y. H. Chrisnanto, and A. Wahana, â€œAnalisis Sentimen Pada Jasa Ojek Online ... (Nugroho dkk.),â€ pp. 156â€“161, 2015.

L. Wu and M. Li, â€œPredict the Customer Churn Problem,â€ 2018 5th Int. Conf. Ind. Econ. Syst. Ind. Secur. Eng., pp. 1â€“5, 2018.

B. Gunawan, H. S. Pratiwi, and E. E. Pratama, â€œSistem Analisis Sentimen pada Ulasan Produk Menggunakan Metode Naive Bayes,â€ J. Edukasi dan Penelit. Inform., vol. 4, no. 2, p. 113, 2018, doi: 10.26418/jp.v4i2.27526.

N. Ruhyana, â€œAnalisis Sentimen Terhadap Penerapan Sistem Plat Nomor Ganjil / Genap Pada Twitter Dengan Metode Klasifikasi Naive Bayes,â€ J. IKRA-ITH Inform., vol. 3, no. 1, pp. 94â€“99, 2019.

A. Saleh, â€œImplementasi Metode Klasifikasi NaÃ¯ve Bayes Dalam Memprediksi Besarnya Penggunaan Listrik Rumah Tangga,â€ vol. 2, no. 3, pp. 207â€“217, 2015.

M. S. Mustafa, M. R. Ramadhan, and A. P. Thenata, â€œImplementasi Data Mining untuk Evaluasi Kinerja Akademik Mahasiswa Menggunakan Algoritma Naive Bayes Classifier,â€ Creat. Inf. Technol. J., vol. 4, no. 2, p. 151, 2018, doi: 10.24076/citec.2017v4i2.106.

P. A. Putri, Ridok, and Indriati, â€œImplementasi Metode Improved K-Nearest Neighbor pada Analisis Sentimen Twitter Berbahasa Indonesia,â€ Repos. J. Mhs. PTIIK UB, vol. 2, pp. 1â€“8, 2013.

M. G. Pradana, A. C. Nurcahyo, and P. H. Saputro, â€œPENGARUH SENTIMEN DI SOSIAL MEDIA DENGAN HARGA SAHAM PERUSAHAAN,â€ Edutic - Sci. J. Informatics Educ., vol. 6, no. 2, 2020, doi: 10.21107/edutic.v6i2.6992.

M. Habibi, â€œAnalisis Sentimen dan Klasifikasi Komentar Mahasiswa pada Sistem Evaluasi Pembelajaran Menggunakan Kombinasi KNN Berbasis Cosine Similarity dan Supervised Model,â€ Dep. Ilmu Komput. dan Elektron. Fak. Mat. dan Ilmu Pengetah. Alam, 2017.

P. T. Ahp, â€œAnalisis Kepribadian Melalui Twitter Menggunakan Metode Logistic Regression dengan,â€ Anal. Kepribadian Melalui Twitter Menggunakan Metod. Logist. Regres. dengan, vol. 6, no. 2, pp. 9667â€“9682, 2019.

R. Lakshmi, R. B. Divya Satya, and R. Valarmathi, â€œAnalysis of sentiment in twitter using logistic regression,â€ Int. J. Eng. Technol., vol. 7, no. 2.33 Special Issue 33, pp. 619â€“621, 2018, doi: 10.14419/ijet.v7i2.33.14849.

Q. Cheng, P. K. Varshney, and M. K. Arora, â€œLogistic regression for feature selection and soft classification of remote sensing data,â€ IEEE Geosci. Remote Sens. Lett., vol. 3, no. 4, pp. 491â€“494, 2006, doi: 10.1109/LGRS.2006.877949.

Z. Wang and X. Sun, â€œDocument classification algorithm based on kernel logistic regression,â€ 2010 2nd Int. Conf. Ind. Inf. Syst. IIS 2010, vol. 1, pp. 76â€“79, 2010, doi: 10.1109/INDUSIS.2010.5565909.

S. L. David W. Hosmer, â€œBreaking the â€˜Invisible-professionâ€™ paradigm,â€ Journal of Environmental Health, vol. 70, no. 3. 2007.

A. Prabhat and V. Khullar, â€œSentiment classification on big data using NaÃ¯ve bayes and logistic regression,â€ 2017 Int. Conf. Comput. Commun. Informatics, ICCCI 2017, 2017, doi: 10.1109/ICCCI.2017.8117734.

S. H. Adil, M. Ebrahim, K. Raza, S. S. Azhar Ali, and M. Ahmed Hashmani, â€œLiver Patient Classification using Logistic Regression,â€ 2018 4th Int. Conf. Comput. Inf. Sci. Revolutionising Digit. Landsc. Sustain. Smart Soc. ICCOINS 2018 - Proc., pp. 1â€“5, 2018, doi: 10.1109/ICCOINS.2018.8510581.

A. Z. Amrullah, A. Sofyan Anas, and M. A. J. Hidayat, â€œAnalisis Sentimen Movie Review Menggunakan Naive Bayes Classifier Dengan Seleksi Fitur Chi Square,â€ Jurnal, vol. 2, no. 1, pp. 40â€“44, 2020, doi: 10.30812/bite.v2i1.804.

Haniah Mahmudah, Okkie Puspitorini, Nur Adi Siswandari, Ari Wijayanti, and Eliya Alfatekha, â€œMetode Naive Bayes Classifier â€“ Smoothing pada Sensor Smartphone untuk Klasifikasi Aktivitas Pengendara,â€ J. Nas. Tek. Elektro dan Teknol. Inf., vol. 9, no. 3, pp. 268â€“277, 2020, doi: 10.22146/.v9i3.382.

E. Fix and J. L. Hodges, â€œDiscriminatory Analysis. Nonparametric Discrimination: Consistency Properties,â€ Int. Stat. Rev. / Rev. Int. Stat., vol. 57, no. 3, p. 238, 1989, doi: 10.2307/1403797.

N. S. Altman, â€œAn introduction to kernel and nearest-neighbor nonparametric regression,â€ Am. Stat., vol. 46, no. 3, pp. 175â€“185, 1992, doi: 10.1080/00031305.1992.10475879.

S. M. Piryonesi and T. E. El-Diraby, â€œRole of Data Analytics in Infrastructure Asset Management: Overcoming Data Size and Quality Problems,â€ J. Transp. Eng. Part B Pavements, vol. 146, no. 2, p. 04020022, 2020, doi: 10.1061/jpeodx.0000175.

T. Hastie, R. Tibshirani, J. H. Friedman, and MyiLibrary., â€œThe elements of statistical learning data mining, inference, and prediction : with 200 full-color illustrations,â€ Springer Ser. Stat., p. xvi, 533 p., 2001, [Online]. Available: http://www.myilibrary.com?id=18743.

G. Abdillah et al., â€œPENERAPAN DATA MINING PEMAKAIAN AIR PELANGGAN UNTUK MENENTUKAN KLASIFIKASI POTENSI PEMAKAIAN AIR PELANGGAN BARU DI PDAM TIRTA RAHARJA MENGGUNAKAN ALGORITMA K-MEANS,â€ SENTIKA 2016, vol. 2016, no. Sentika, pp. 18â€“19, 2016.