Mining urban events from the tweet stream through a probabilistic mixture model

We are pleased to inform that we can publicly share a full-text view-only version of our last paper  “Mining urban events from the tweet stream through a probabilistic mixture model” published in Data Mining and Knowledge Discovery journal, as part of the Springer Nature SharedIt initiative. This paper is part of the work that our brilliant PhD student Joan Capdevila is doing. His PhD is co-advised with Jesús Cerquides from the IIIA -CSIC research centre.
The geographical identication of content in Social Networks have enabled to bridge the gap between online social platforms and the physical world. Although vast amounts of data in such networks are due to breaking news or global occurrences, local events witnessed by users in situ are also present in these streams and of great importance for many city entities. Nowadays, unsupervised machine learning techniques, such as Tweet-SCAN, are able to retrospectively detect these local events from tweets. However, these approaches have limited abilities to reason about unseen observations in a principled way due to the lack of a proper probabilistic foundation. Probabilistic models have also been proposed for the task, but their event identication capabilities are far from those of Tweet-SCAN. In this paper, we identify twkey factors which, when combined, boost the accuracy of such models. As a rst kefactor, we notice that the large amount of meaningless social data requires explicitly modeling non-event observations. Therefore, we propose to incorporate a background model that captures spatio-temporal uctuations of non-event tweets. As a second kefactor, we observe that the shortness of tweets hampers the application of traditional  topic models. Thus, we integrate event detection and topic modeling, assigning topic proportions to events instead of assigning them to individual tweets. As a result, we propose Warble, a new probabilistic model and learning scheme for retrospectivevent detection that incorporates these two key factors. We evaluate Warble in a data set of tweets located in Barcelona during its festivities. The empirical results shothat the model outperforms other state-of-the-art techniques in detecting various types of events while relying on a principled probabilistic framework that enables to reason under uncertainty.
This GitHub repository contains the WARBLE code, which implements the probabilistic model and learning scheme presented in this paper.