Taxonomy-based Supervised Topic Labeling

  • Ajwani D.
  • Nicholson P.
  • Sala A.
  • Taneva-Popova B.

The rise of AI assisted applications, bots-based customer support and personal assistants is imposing higher and higher accuracy requirements in interpreting and contextualizing text data. A fundamental building block in automated understanding of text is to assign a topic label to a text document, at an appropriate level of granularity. The topic label should generalize the entities in the document, but it shouldn't be too generic. The state-of-the-art solutions to this problem use unsupervised methods that either do not leverage the taxonomy structure or model the taxonomy as undirected graphs. The undirected paths mix the hypernym and hyponym edges arbitrarily and are, often, a poor indicator of semantic relatedness. As a result, the labels generated by these approaches struggle to achieve high accuracy. We propose novel directional traversal measures based on modeling the taxonomy as a directed acyclic graph. In addition, we leverage information-theoretic measures based on Mutual Information. We combine the power of our novel graph-theoretic and information-theoretic measures with existing measures (e.g., content-based) by using them as features in a supervised learning approach. Our evaluation on Amazon Mechanical Turk shows that the topic labels generated by our supervised method are significantly more accurate than the baseline state-of-the-art approaches from the literature, on a range of document corpora.

Recent Publications

January 01, 2019

Friendly, appealing or both? Characterising user experience in sponsored search landing pages

  • Bron M.
  • Chute M.
  • Evans H.
  • Lalmas M.
  • Redi M.
  • Silvestri F.

© 2017 International World Wide Web Conference Committee (IW3C2), published under Creative Commons CC BY 4.0 License. Many of today's websites have recognised the importance of mobile friendly pages to keep users engaged and to provide a satisfying user experience. However, next to the experience provided by the sites themselves, ...

January 01, 2019

Analyzing uber's ride-sharing economy

  • Aiello L.
  • Djuric N.
  • Grbovic M.
  • Kooti F.
  • Lerman K.
  • Radosavljevic V.

© 2017 International World Wide Web Conference Committee (IW3C2), published under Creative Commons CC BY 4.0 License. Uber is a popular ride-sharing application that matches people who need a ride (or riders) with drivers who are willing to provide it using their personal vehicles. Despite its growing popularity, there exist ...

January 01, 2019

The paradigm-shift of social spambots: Evidence, theories, and tools for the arms race

  • Cresci S.
  • Petrocchi M.
  • Pietro R.
  • Spognardi A.
  • Tesconi M.

© 2017 International World Wide Web Conference Committee (IW3C2), published under Creative Commons CC BY 4.0 License. Recent studies in social media spam and automation provide anecdotal argumentation of the rise of a new generation of spambots, so-called social spambots. Here, for the first time, we extensively study this novel ...