Computer-aided topic modelling based on festival-goers’ opinions – results of an experiment

Authors

  • Mátyás Hinek Budapest Metropolitan University

DOI:

https://doi.org/10.14267/TURBULL.2021v21n1.1

Keywords:

social media analysis, natural language processing, topic modelling, latent Dirichlet allocation

Abstract

In our study, we attempt to determine the typical topics of opinions written by Sziget Festival visitors on Facebook using structured topic model (stm) computer algorithm and latent Dirichlet allocation, and compare the results with our previous research. Based on written opinions of the visitors of the Sziget Festival in the last seven years, we modelled nine topics. Their content and scope partly matched the topics identified in our previous qualitative research. The most important result of our study is that visitor opinions can be successfully examined with computer tools, but the quality of the results is determined by the size of the corpus, i.e. the number and scope of the analysed posts.

References

AIROLDI, E. M. – BLEI, D. M. – EROSHEVA, E. A. – FIENBERG, S. E. (2014): Introduction to Mixed Membership Models and Methods. Handbook of mixed membership models and their applications. 100. pp. 3–14.

BALOGH K. (2015). A látens Dirichlet allokáció társadalomtudományi alkalmazása. A kuruc. info romaellenes megnyilvánulásainak tematikus elemzése. Szakdolgozat, ELTE Társadalomtudományi Kar, mesterképzés. https://tas.precognox.com/labs/kuruc-info-visualization/A_latens_Dirichlet_allokacio_tarsadalomtudomanyi_alkalmazasa_Balogh_Kitti.pdf

BÍRÓ I. (2009): Dokumentum osztályozás rejtett Dirichlet-allokációval. PhD dolgozat. Eötvös Lóránt Tudományegyetem, Informatikai Kar, Információtudományi Tanszék, Informatikai Doktori Iskola. http://www.tnkcs.inf.elte.hu/vedes/Biro_Istvan_Tezisek_hu.pdf

BÍRÓ, I. – SIKLÓSI, D. – SZABÓ, J. – BENCZÚR, A. (2009a): Linked latent dirichlet allocation in web spam filtering. AIRWeb ‘09: Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web. pp. 37–40. https://doi.org/10.1145/1531914.1531922

BÍRÓ, I. – SZABÓ, J. (2009b): Latent dirichlet allocation for automatic document categorization. In: Buntine, W. – Grobelnik, M. – Mladenić, D. – Shawe-Taylor, J. (eds): Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2009. Lecture Notes in Computer Science. 5782. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04174-7_28

BLEI, D. M. – LAFFERTY, J. D. (2006): Correlated topic models. Advances in neural information processing systems. NIPS 18. pp. 147–154.

BLEI, D. M. – NG, A. Y. – JORDAN, M. I. (2003): Latent dirichlet allocation. Journal of Machine Learning Research. 3(Jan). pp. 993–1022.

CROSSLEY, S. – DASCALU, M. – McNAMARA, D. (2017): How important is size? An investigation of corpus size and meaning in both latent semantic analysis and latent Dirichlet allocation. Proceedings of the 30th International Florida Artificial Intelligence Research Society (FLAIRS) Conference.

CURRY, T. A. – FIX, M. P. (2019): May it please the twitterverse: The use of Twitter by state high court judges. Journal of Information Technology & Politics. 16(4). pp. 379–393. https://doi.org/10.1080/19331681.2019.1657048

EISENSTEIN, J. – AHMED, A. – XING, E. P. (2011): Sparse additive generative models of text. Proceedings of the 28th International Conference on Machine Learning. June 2011. Bellevue, WA, USA. pp. 1041–1048.

FISCHER-PREßLER, D. – SCHWEMMER, C. – FISCHBACH, K. (2019): Collective sense-making in times of crisis: Connecting terror management theory with Twitter user reactions to the Berlin terrorist attack. Computers in Human Behavior. 100. pp. 138–151. https://doi.org/10.1016/j.chb.2019.05.012

GERRISH, S. – BLEI, D. M. (2012): How they vote: Issue-adjusted models of legislative behavior. Advances in neural information processing systems 25. (NIPS 2012). pp. 2753–2761.

HINEK M. – KULCSÁR N. (2019): Fesztiválélmény a közösségi médiában: a Sziget Fesztivál példája. Turizmus Bulletin. 19(3). pp. 4–12.

KRESTEL, R. – FANKHAUSER, P. – NEJDL, W. (2009): Latent dirichlet allocation for tag recommendation. In: Proceedings of the third ACM conference on Recommender systems. pp. 61–68. https://doi.org/10.1145/1639714.1639726

PAUL, M. J. – DREDZE, M. (2015): SPRITE: Generalizing topic models with structured priors. Transactions of the Association for Computational Linguistics. 3. pp. 43–57. https://doi.org/10.1162/tacl_a_00121

R CORE TEAM (2020): R: A language and environ¬ment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/

ROBERTS, M. E. – STEWART, B. M. – TINGLEY, D. (2019): stm: An R package for structural topic models. Journal of Statistical Software. 91(2). pp. 1–40. https://doi.org/10.18637/jss.v091.i02

RODRIGUEZ, M. Y. – STORER, H. (2020): A computational social science perspective on qualitative data exploration: Using topic models for the descriptive analysis of social media data. Journal of Technology in Human Services. 38(1). pp. 54–86. https://doi.org/10.1080/15228835.2019.1616350

SYED, S. – SPRUIT, M. (2017): Full-text or abstract? Examining topic coherence scores using latent dirichlet allocation. 2017 IEEE International conference on data science and advanced analytics (DSAA). Tokyo. pp. 165–174. https://doi.org/10.1109/DSAA.2017.61

WANG, Y. C. – BURKE, M. – KRAUT, R. E. (2013): Gender, topic, and audience response: an analysis of user-generated content on Facebook. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, USA. pp. 31–34. https://doi.org/10.1145/2470654.2470659

WEINSHALL, D. – LEVI, G. – HANUKAEV, D. (2013): LDA topic model with soft assignment of descriptors to words. Proceedings of the 30th International Conference on Machine Learning. Atlanta, Georgia, USA. JMLR: W&CP 28. pp. 711–719.

WILSON, A. – CHEW, P. A. (2010): Term weighting schemes for latent dirichlet allocation. Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Los Angeles, California. pp. 465–473.

Downloads

Published

2021-04-21

How to Cite

Hinek, M. (2021). Computer-aided topic modelling based on festival-goers’ opinions – results of an experiment. Turizmus Bulletin, 21(1), 4–12. https://doi.org/10.14267/TURBULL.2021v21n1.1

Issue

Section

Studies