Computer-aided topic modelling based on festival-goers’ opinions – results of an experiment
DOI:
https://doi.org/10.14267/TURBULL.2021v21n1.1Keywords:
social media analysis, natural language processing, topic modelling, latent Dirichlet allocationAbstract
In our study, we attempt to determine the typical topics of opinions written by Sziget Festival visitors on Facebook using structured topic model (stm) computer algorithm and latent Dirichlet allocation, and compare the results with our previous research. Based on written opinions of the visitors of the Sziget Festival in the last seven years, we modelled nine topics. Their content and scope partly matched the topics identified in our previous qualitative research. The most important result of our study is that visitor opinions can be successfully examined with computer tools, but the quality of the results is determined by the size of the corpus, i.e. the number and scope of the analysed posts.
References
AIROLDI, E. M. – BLEI, D. M. – EROSHEVA, E. A. – FIENBERG, S. E. (2014): Introduction to Mixed Membership Models and Methods. Handbook of mixed membership models and their applications. 100. pp. 3–14.
BALOGH K. (2015). A látens Dirichlet allokáció társadalomtudományi alkalmazása. A kuruc. info romaellenes megnyilvánulásainak tematikus elemzése. Szakdolgozat, ELTE Társadalomtudományi Kar, mesterképzés. https://tas.precognox.com/labs/kuruc-info-visualization/A_latens_Dirichlet_allokacio_tarsadalomtudomanyi_alkalmazasa_Balogh_Kitti.pdf
BÍRÓ I. (2009): Dokumentum osztályozás rejtett Dirichlet-allokációval. PhD dolgozat. Eötvös Lóránt Tudományegyetem, Informatikai Kar, Információtudományi Tanszék, Informatikai Doktori Iskola. http://www.tnkcs.inf.elte.hu/vedes/Biro_Istvan_Tezisek_hu.pdf
BÍRÓ, I. – SIKLÓSI, D. – SZABÓ, J. – BENCZÚR, A. (2009a): Linked latent dirichlet allocation in web spam filtering. AIRWeb ‘09: Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web. pp. 37–40. https://doi.org/10.1145/1531914.1531922
BÍRÓ, I. – SZABÓ, J. (2009b): Latent dirichlet allocation for automatic document categorization. In: Buntine, W. – Grobelnik, M. – Mladenić, D. – Shawe-Taylor, J. (eds): Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2009. Lecture Notes in Computer Science. 5782. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04174-7_28
BLEI, D. M. – LAFFERTY, J. D. (2006): Correlated topic models. Advances in neural information processing systems. NIPS 18. pp. 147–154.
BLEI, D. M. – NG, A. Y. – JORDAN, M. I. (2003): Latent dirichlet allocation. Journal of Machine Learning Research. 3(Jan). pp. 993–1022.
CROSSLEY, S. – DASCALU, M. – McNAMARA, D. (2017): How important is size? An investigation of corpus size and meaning in both latent semantic analysis and latent Dirichlet allocation. Proceedings of the 30th International Florida Artificial Intelligence Research Society (FLAIRS) Conference.
CURRY, T. A. – FIX, M. P. (2019): May it please the twitterverse: The use of Twitter by state high court judges. Journal of Information Technology & Politics. 16(4). pp. 379–393. https://doi.org/10.1080/19331681.2019.1657048
EISENSTEIN, J. – AHMED, A. – XING, E. P. (2011): Sparse additive generative models of text. Proceedings of the 28th International Conference on Machine Learning. June 2011. Bellevue, WA, USA. pp. 1041–1048.
FISCHER-PREßLER, D. – SCHWEMMER, C. – FISCHBACH, K. (2019): Collective sense-making in times of crisis: Connecting terror management theory with Twitter user reactions to the Berlin terrorist attack. Computers in Human Behavior. 100. pp. 138–151. https://doi.org/10.1016/j.chb.2019.05.012
GERRISH, S. – BLEI, D. M. (2012): How they vote: Issue-adjusted models of legislative behavior. Advances in neural information processing systems 25. (NIPS 2012). pp. 2753–2761.
HINEK M. – KULCSÁR N. (2019): Fesztiválélmény a közösségi médiában: a Sziget Fesztivál példája. Turizmus Bulletin. 19(3). pp. 4–12.
KRESTEL, R. – FANKHAUSER, P. – NEJDL, W. (2009): Latent dirichlet allocation for tag recommendation. In: Proceedings of the third ACM conference on Recommender systems. pp. 61–68. https://doi.org/10.1145/1639714.1639726
PAUL, M. J. – DREDZE, M. (2015): SPRITE: Generalizing topic models with structured priors. Transactions of the Association for Computational Linguistics. 3. pp. 43–57. https://doi.org/10.1162/tacl_a_00121
R CORE TEAM (2020): R: A language and environ¬ment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/
ROBERTS, M. E. – STEWART, B. M. – TINGLEY, D. (2019): stm: An R package for structural topic models. Journal of Statistical Software. 91(2). pp. 1–40. https://doi.org/10.18637/jss.v091.i02
RODRIGUEZ, M. Y. – STORER, H. (2020): A computational social science perspective on qualitative data exploration: Using topic models for the descriptive analysis of social media data. Journal of Technology in Human Services. 38(1). pp. 54–86. https://doi.org/10.1080/15228835.2019.1616350
SYED, S. – SPRUIT, M. (2017): Full-text or abstract? Examining topic coherence scores using latent dirichlet allocation. 2017 IEEE International conference on data science and advanced analytics (DSAA). Tokyo. pp. 165–174. https://doi.org/10.1109/DSAA.2017.61
WANG, Y. C. – BURKE, M. – KRAUT, R. E. (2013): Gender, topic, and audience response: an analysis of user-generated content on Facebook. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, USA. pp. 31–34. https://doi.org/10.1145/2470654.2470659
WEINSHALL, D. – LEVI, G. – HANUKAEV, D. (2013): LDA topic model with soft assignment of descriptors to words. Proceedings of the 30th International Conference on Machine Learning. Atlanta, Georgia, USA. JMLR: W&CP 28. pp. 711–719.
WILSON, A. – CHEW, P. A. (2010): Term weighting schemes for latent dirichlet allocation. Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Los Angeles, California. pp. 465–473.