Detection of Hate Speech in TikTok Comment Sections Using the Naïve Bayes Algorithm with Smoothing Implementation

Main Article Content

Roy Rafles Matorang Pasaribu
Didik Kurniawan
Muhaqiqin Muhaqiqin
Akmal Junaidi

Abstract

Hate speech is a biased, antagonistic, and discriminatory expression that commonly appears on social media platforms, including TikTok. The high volume of comments and varied language styles make manual detection challenging. This research proposes a hate speech detection model using the Multinomial Naïve Bayes algorithm with smoothing to address zero-probability issues and enhance prediction performance. The dataset is split into 80% training and 20% testing portions. The model achieves an accuracy of 88.41%, with precision, recall, and F1-score showing balanced performance. A user evaluation involving 35 participants and 7,415 TikTok comments records a detection accuracy of 68.6%. The model is further implemented into a Google Chrome extension capable of real-time hate speech detection, displaying prediction probabilities and allowing user validation. This study aims to support healthier digital interactions by improving automated hate speech detection on social media.

Article Details

How to Cite
Pasaribu, R. R. M., Kurniawan, D., Muhaqiqin, M., & Junaidi, A. (2025). Detection of Hate Speech in TikTok Comment Sections Using the Naïve Bayes Algorithm with Smoothing Implementation. Jurnal Pepadun, 6(3), 207–219. https://doi.org/10.23960/pepadun.v6i3.268

References

F. Poletto, V. Basile, M. Sanguinetti, C. Bosco, and V. Patti, “Resources and Benchmark Corpora for Hate Speech Detection: A Systematic Review, Language Resources and Evaluation, vol. 55, no. 2, pp. 477-523, 2021.

C. Elliott, W. Chuma, and Y. E. Gendi, “Hate Speech, Key Concept Paper”, Media Conflict and Democratisation (MeCoDEM), United Kingdom, 2016.

M. Subramanian, S. V. Easwaramoorthy, G. Deepalakshmi, J. Cho, and G. Manikandan, “A Survey on Hate Speech Detection and Sentiment Analysis Using Machine Learning and Deep Learning Models”, Alexandria Engineering Journal, vol. 80, pp. 110-121, 2023.

C. M. Murphy and D. McCashin, “Using TikTok for Public and Youthmental Health - a Systematic Review and Content Analysis”, Clinical Child Psychology and Psychiatry, vol. 28, no. 1, pp. 279-306, 2023.

D. Zulli and D. J. Zulli, “Extending the Internet Meme: Conceptualizing Technological Mimesis and Imitation Publics on the TikTok Platform”, New Media and Society, vol. 24, no. 8, pp. 1872-1890, 2022.

S. V. Mahardhika, I. Nurjannah, I. I. Ma’una, and Z. Islamiyah, “Faktor-Faktor Penyebab Tingginya Minat Generasi Post-Millenial Di Indonesia Terhadap Penggunaan Aplikasi Tik-Tok”, SOSEARCH: Social Science Educational Research, vol. 2, no. 1, pp. 40-53, 2021.

R. N. Ria and T. Setiawan, “Forensic Linguistic Analysis of Netizens’ Hate Speech Acts in Tik-Tok Comment Section”, Britain International of Linguistics Arts and Education (BIoLAE) Journal, vol. 5, no. 2, pp. 141-152, 2023.

E. Prasetyo, M. F. Al-adni, and R. F. Tias, “Classification of Cash Direct Recipients Using the Naive Bayes with Smoothing”, Jurnal Manajemen, Teknik Informatika, dan Rekayasa Komputer, vol. 23, no. 3, pp. 615-626, 2024.

A. Ali, A. Khairan, F. Tempola, and A. Fuad, “Application of Naïve Bayes to Predict the Potential of Rain in Ternate City”, E3S Web of Conferences, vol. 328, 2021.

D. A Pisner and D. M. Schnyer, Support Vector Machine. In Machine Learning: Methods and Applications to Brain Disorders, Elsevier Inc, 2019.

R. K. Putri, M. Athoillah, A. Haqiqiyah, and F. W. A. Lestari, “Deteksi Penggunaan Masker Wajah Dengan Algoritma Deep Learning”, Prosiding Seminar Nasional Hasil Riset dan Pengabdian. Surabaya, 2023.

V. Jackins, S. Vimal, M. Kaliappan, and M. Y. Lee, “AI-based Smart Prediction of Clinical Disease Using Random Forest Classifier and Naive Bayes, Journal of Supercomputing, vol. 77, no. 5, pp. 5198-5219, 2021.

Y. Tan and P. P. Shenoy, “A Bias-Variance Based Heuristic for Constructing a Hybrid Logistic Regression-Naïve Bayes Model for Classification”, International Journal of Approximate Reasoning, vol. 117, pp. 15-28, 2020.

A. P. Noto and D. R. S. Saputro, “Classification Data Mining with Laplacian Smoothing on Naïve Bayes Method, AIP Conference Proceedings, Solo, 2022.

J. Pan, M. Sun, Y. Wang, and X. Zhang, “An Enhanced Spatial Smoothing Technique with ESPRIT Algorithm for Direction of Arrival Estimation in Coherent Scenarios”, IEEE Transactions on Signal Processing, vol. 68, pp. 3635-3643, 2020.

A. W. Pradana and M. Hayaty, "The Effect of Stemming and Removal of Stopwords on the Accuracy of Sentiment Analysis on Indonesian-language Texts”, Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control, vol. 4, no. 3, pp. 375-380, 2019.

M. M. Saritas and A. Yasar, “Performance Analysis of ANN and Naive Bayes Classification Algorithm for Data Classification”, IJISAE: International Journal of Intelligent Systems and Applications in Engineering, vol. 7, no. 2, pp. 88-91, 2019.

A. Ariska and M. Kamayani, “Deteksi Hate Speech pada Kolom Komentar TikTok dengan Menggunakan SVM”, Indonesian Journal of Computer Science, vol. 13, no. 3, pp. 284-301, 2024.

D. Febiharsa, I. M. Sudana, and N. Hudallah, “Uji Fungsionalitas (Blackbox Testing) Sistem Informasi Lembaga Sertifikasi Profesi (SILSP) Batik dengan AppPerfect Web Test dan Uji Pengguna”, Joined Journal (Journal of Informatics Education), vol. 1, no. 2, pp. 117, 2019.

A. Verma, A. Khatana, and S. Chaudhary, “A Comparative Study of Black Box Testing and White Box Testing”, International Journal of Computer Sciences and Engineering, vol. 5, no. 12, pp. 301-304, 2017.

F. Koto and Y. R. Gemala, “InSet Lexicon: Evaluation of a Word List for Indonesian Sentiment Analysis in Microblogs”, International Conference on Asian Language Processing (IALP), pp. 391-394, 2017.

Z. Zhu, J. Liang, D. Li, H. Yu, and G. Liu, “Hot Topic Detection Based on a Refined TF-IDF Algorithm”, IEEE Access, vol. 7, pp. 26996-27007, 2019.

K. Teguh, K. Kridalukmana, R. Rinta and M. Martono, “Pembuatan Chrome Extension untuk Akses Website Sistem Komputer”, Proceedings Business Intelligence: Extending Your Business, pp. 81-92, 2012.

M. N. Huda, M. Burhan, A. Satibi, H. A. Pradita, and A. Saifudin, “Implementasi Black Box Testing pada Aplikasi Sistem Kasir dengan Menggunakan Teknik Equivalence Partitions”, Jurnal Teknologi Sistem Informasi dan Aplikasi, 2023.

S. Robinson and M. Heusser, “Black-box testing”, TechTarget: SearchSoftwareQuality, 2024.

G. J. Myers, C. Sandler, and T. Badgett, The Art of Software Testing (3rd ed.), Wiley, 2011.

P. Ammann, and J. Offutt, Introduction to Software Testing (2nd ed.), Cambridge University Press, 2016.

T. T. A. Putri, S. Sriadhi, R. D. Sari, R. Rahmadani, and H. D. Hutahaean, “A Comparison of Classification Algorithms for Hate Speech Detection”, IOP Conference Series: Materials Science and Engineering, vol. 830, no. 3, 2020.

N. R. Fatahillah, P. Suryati, and C. Haryawan, “Implementation of Naive Bayes Classifier Algorithm on Social Media (Twitter) to the Teaching of Indonesian Hate Speech”, Proceedings - 2017 International Conference on Sustainable Information Engineering and Technology, pp. 128-131, 2017.