AI VTuber Development Base on GPT-4 for Casual Chat Interaction on YouTube Live Streaming: Case Study KAIRA Channel

Main Article Content

Muhammad Fadhilah Ramadhani
Akmal Junaidi
Ossy Dwi Endah Wulansari
Rico Andrian
Favorisen Rosyking Lumbanraja

Abstract

This study presents the development and evaluation of KAIRA, an Indonesian AI-based VTuber designed to engage in casual conversations during live streaming on YouTube. KAIRA integrates GPT-4 for language generation, ElevenLabs for voice synthesis, and VTube Studio for real-time avatar animation. The system was evaluated through both controlled and public testing sessions, focusing on interaction quality, contextual relevance, and character liveliness. Semantic similarity was analyzed using IndoBERT and IndoRoBERTa to assess the alignment between the system's responses and user expectations. Additionally, a topic filtering mechanism based on cosine similarity is implemented to ensure KAIRA remains focused on a topic, with the testing topic being gaming. Evaluation of this classifier using a manually labeled dataset yielded an accuracy of 50.33%, revealing significant classification errors. However, due to the system’s flexible routing logic, misclassified messages often still received contextually appropriate responses. These findings highlight the dual challenges of topic enforcement and conversational coherence in real-time AI systems and contribute to the growing field of virtual character development and conversational AI for Indonesian-language contexts.

Article Details

How to Cite
Ramadhani, M. F., Junaidi, A., Wulansari, O. D. E., Andrian, R., & Lumbanraja, F. R. (2025). AI VTuber Development Base on GPT-4 for Casual Chat Interaction on YouTube Live Streaming: Case Study KAIRA Channel. Jurnal Pepadun, 6(2), 131–145. https://doi.org/10.23960/pepadun.v6i2.285

References

D. R. Puspitaningrum and A. Prasetio, “Fenomena ‘Virtual Youtuber’ Kizuna Ai di Kalangan Penggemar Budaya Populer Jepang di Indonesia,” Mediator: Jurnal Komunikasi, vol. 12, no. 2, Dec. 2019, doi: 10.29313/mediator.v12i2.4758.

D. Kim, S. Lee, Y. Jun, Y. Shin, and J. Lee, “VTuber’s Atelier: The Design Space, Challenges, and Opportunities for VTubing,” in Conference on Human Factors in Computing Systems - Proceedings, Association for Computing Machinery, Apr. 2025. doi: 10.1145/3706598.3714107.

W. A. Hamilton, O. Garretson, and A. Kerne, “Streaming on twitch: Fostering participatory communities of play within live mixed media,” in Conference on Human Factors in Computing Systems - Proceedings, Association for Computing Machinery, 2014, pp. 1315–1324. doi: 10.1145/2556288.2557048.

A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, “Language Models are Unsupervised Multitask Learners,” 2023. [Online]. Available: https://github.com/codelucas/newspaper [Accessed: Sep. 6, 2024]

S. Oyucu, “A Novel End-to-End Turkish Text-to-Speech (TTS) System via Deep Learning,” Electronics (Switzerland), vol. 12, no. 8, Apr. 2023, doi: 10.3390/electronics12081900.

T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amodei, “Language Models are Few-Shot Learners,” arXiv preprint, arXiv:2005.14165, Jul. 2020. [Online]. Available: https://arxiv.org/abs/2005.14165 [Accessed: Sep. 6, 2024].

H.-W. Chen, Endoscopic Endonasal Skull Base Surgery For Pituitary Lesions: An AI-Assisted Creative Workflow To Develop An Animated Educational Resource For Patients And Physicians, Master’s thesis, University of Toronto, Toronto, Canada, 2023.

N. Amato, B. De Carolis, F. De Gioia, M. N. Venezia, G. Palestra, and C. Loglisci, “Can an AI-driven VTuber engage People? The KawAIi Case Study,” in Joint Proceedings of the ACM IUI Workshops 2024, Greenville, SC, USA, Mar. 18–21, 2024.

M. Gerlich, “The Power of Virtual Influencers: Impact on Consumer Behaviour and Attitudes in the Age of AI,” Adm Sci, vol. 13, no. 8, Aug. 2023, doi: 10.3390/admsci13080178.

Ç. Ö. Güzel, “The Autthenticity of AI Influencers in Marketing,” in Understanding Generative AI in a Cultural Context, 2024, pp. 247–274. doi: 10.4018/979-8-3693-7235-7.ch010.

C.-M. Lee, “The Key Factors Affecting Audience’s Support for VTubers,” 2024. [Online]. Available: https://www.researchgate.net/publication/384159907 [Accessed: Sep. 30, 2024]

H. Hermawan, P. Subarkah, A. T. Utomo, F. Ilham, and D. I. S. Saputra, “VTuber Personas In Digital Wayang: A Review Of Innovative Cultural Promotion For Indonesian Heritage,” Jurnal Pilar Nusa Mandiri, vol. 20, no. 2, pp. 165–175, Sep. 2024, doi: 10.33480/pilar.v20i2.5921.

M. T. Tang, V. L. Zhu, and V. Popescu, “Alterecho: Loose avatar-streamer coupling for expressive VTubing,” in Proceedings - 2021 IEEE International Symposium on Mixed and Augmented Reality, ISMAR 2021, Institute of Electrical and Electronics Engineers Inc., 2021, pp. 128–137. doi: 10.1109/ISMAR52148.2021.00027.

L. Judijanto, A. I. Puspitasari, M. Rahmawati, M. S. Mahendra, and A. S. Nurhidayat, Metodologi Research And Development (Teori dan Penerapan Metodologi RnD). Jakarta: PT. Sonpedia Publishing Indonesia, 2024. [Online]. Available: https://www.researchgate.net/publication/381290945 [Accessed: May. 2, 2025].

M. Van Poucke, “ChatGPT, the perfect virtual teaching assistant? Ideological bias in learner-chatbot interactions,” Comput Compos, vol. 73, Sep. 2024, doi: 10.1016/j.compcom.2024.102871.

M. Aljanabi, M. Ghazi, A. H. Ali, S. A. Abed, and C. Gpt, “ChatGpt: Open Possibilities,” 2023, College of Education, Al-Iraqia University. doi: 10.52866/20ijcsm.2023.01.01.0018.

B. Alturas, “Connection between UML use case diagrams and UML class diagrams: a matrix proposal,” International Journal of Computer Applications in Technology, vol. 72, no. 3, pp. 161–168, 2023, doi: 10.1504/IJCAT.2023.133294.

E. Aquino, P. de Saqui-Sannes, and R. A. Vingerhoeds, “A Methodological Assistant for Use Case Diagrams,” in International Conference on Model-Driven Engineering and Software Development, Science and Technology Publications, Lda, 2020, pp. 227–236. doi: 10.5220/0008938002270236.

A. Joshi, S. Kale, S. Chandel, and D. Pal, “Likert Scale: Explored and Explained,” Br J Appl Sci Technol, vol. 7, no. 4, pp. 396–403, Jan. 2015, doi: 10.9734/bjast/2015/14975.

M. Koo and S.-W. Yang, “Likert-Type Scale,” Encyclopedia MDPI, vol. 5, no. 18, Feb. 2025, doi: 10.3390/encyclopedia5010018.

A. F. Hidayat, “Evaluasi Keandalan Cosine Similarity dalam Mendeteksi Plagiarisme Kode Program,” unpublished student paper, IF2123 Aljabar Geometri, Institut Teknologi Bandung, 2024.

F. Koto, A. Rahimi, J. H. Lau, and T. Baldwin, “IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP,” arXiv preprint, arXiv:2011.00677, Nov. 2020. [Online]. Available: https://arxiv.org/abs/2011.00677 [Accessed: Jun. 15, 2025].

M. R. Faisal, K. E. Fitriani, M. I. Mazdadi, F. Indriani, D. T. Nugrahadi, and S. E. Prastya, “Enhancing Natural Disaster Monitoring: A Deep Learning Approach to Social Media Analysis Using Indonesian BERT Variants,” Indonesian Journal of Electronics, Electromedical Engineering, and Medical Informatics , vol. 7, no. 1, pp. 77–89, 2025, doi: 10.35882/ijeeemi.v7i1.38.

L. Geni, E. Yulianti, and D. I. Sensuse, “Sentiment Analysis of Tweets Before the 2024 Elections in Indonesia Using Bert Language Models,” Jurnal Ilmiah Teknik Elektro Komputer dan Informatika, vol. 9, no. 3, pp. 746–757, Aug. 2023, doi: 10.26555/jiteki.v9i3.26490.

E. Yulianti and N. K. Nissa, “ABSA of Indonesian customer reviews using IndoBERT: single-sentence and sentence-pair classification approaches,” Bulletin of Electrical Engineering and Informatics, vol. 13, no. 5, pp. 3579–3589, Oct. 2024, doi: 10.11591/eei.v13i5.8032.

Y. Sagama and A. Alamsyah, “Multi-Label Classification of Indonesian Online Toxicity using BERT and RoBERTa,” in Proceedings of the 2023 IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology, IAICT 2023, Institute of Electrical and Electronics Engineers Inc., 2023, pp. 143–149. doi: 10.1109/IAICT59002.2023.10205892.

I. Ramli, N. Jamil, N. Seman, and N. Ardi, “An Improved Syllabification for a Better Malay Language Text-to-Speech Synthesis (TTS),” in Procedia Computer Science, Elsevier B.V., 2015, pp. 417–424. doi: 10.1016/j.procs.2015.12.280.

R. A. F. Dewatri, A. Z. Al Aqthar, H. Pradana, B. Anugerah, and W. H. Nurcahyo, “Potential Tools to Support Learning: OpenAI and Elevenlabs Integration,” ODELIA: Southeast Asia Journal on Open Distance Learning, vol. 01, no. 02, pp. 59–69, 2023.

K. Zhou, K. Ethayarajh, D. Card, and D. Jurafsky, “Problems with Cosine as a Measure of Embedding Similarity for High Frequency Words,” arXiv preprint, arXiv:2205.05092, May 2022. [Online]. Available: https://arxiv.org/abs/2205.05092 [Accessed: May. 25, 2025].

J. Nielsen, “The Need for Speed in AI,” UX Tigers: Fearless Usability, Aug. 2, 2023. [Online]. Available: https://www.uxtigers.com/post/ai-response-time [Accessed: Jun. 17, 2025].

Y. Shi and B. Deng, “Finding the sweet spot: Exploring the optimal communication delay for AI feedback tools,” Inf Process Manag, vol. 61, no. 2, Mar. 2024.

Y. L. Wang and C. W. Lo, “The effects of response time on older and young adults’ interaction experience with Chatbot,” BMC Psychol, vol. 13, no. 1, Dec. 2025, doi: 10.1186/s40359-025-02459-9.