December 10, 2018

Annotate: Organizing Unstructured Contents Via Topic Labels

  • Ajwani D.
  • Dutta S.
  • Nicholson P.
  • Nobari G.
  • Sala A.
  • Taneva B.

This is a presentation for the publication that has been already cleared for external release. With the advent of Big Data paradigm, filtering, retrieval, and linking of unstructured multi-modal data has become a necessity. Assigning topic labels to contents, that accurately capture the meaning and contextual information, is a fundamental problem in organizing unstructured data. The usage of manually-assigned tags for this purpose introduces inconsistencies because of different "surface forms". On the other hand, existing automated approaches either use hierarchical multi-label classification, or are unsupervised and rely on (undirected) graph measures leveraging taxonomies. While the former requires large training data set to learn the characteristics of each topic class, the latter lacks the flexibility to learn broad range of related topics and are less accurate. We propose a novel framework, ANNOTATE based on a small set of features and directed traversal of taxonomies to learn a broad spectrum of related topics using limited training data. We also show that our approach provides accurate labels for several domains without the need for re-training. For instance, the framework, trained on a small set of BBC news articles, exhibits close matches to user-generated tags for Quora documents. Experimental results, on the same model, for news classification and identifying aspects of Amazon product reviews, based on Amazon Mechanical Turk evaluation show our approach to be significantly better than state-of-the-art. We further present real-life case studies of our proposed framework for automatically tagging Quora posts, and topically segmenting, indexing and linking related YouTube videos.

View Original Article

Recent Publications

January 01, 2019

Friendly, appealing or both? Characterising user experience in sponsored search landing pages

  • Bron M.
  • Chute M.
  • Evans H.
  • Lalmas M.
  • Redi M.
  • Silvestri F.

© 2017 International World Wide Web Conference Committee (IW3C2), published under Creative Commons CC BY 4.0 License. Many of today's websites have recognised the importance of mobile friendly pages to keep users engaged and to provide a satisfying user experience. However, next to the experience provided by the sites themselves, ...

January 01, 2019

Analyzing uber's ride-sharing economy

  • Aiello L.
  • Djuric N.
  • Grbovic M.
  • Kooti F.
  • Lerman K.
  • Radosavljevic V.

© 2017 International World Wide Web Conference Committee (IW3C2), published under Creative Commons CC BY 4.0 License. Uber is a popular ride-sharing application that matches people who need a ride (or riders) with drivers who are willing to provide it using their personal vehicles. Despite its growing popularity, there exist ...

January 01, 2019

The paradigm-shift of social spambots: Evidence, theories, and tools for the arms race

  • Cresci S.
  • Petrocchi M.
  • Pietro R.
  • Spognardi A.
  • Tesconi M.

© 2017 International World Wide Web Conference Committee (IW3C2), published under Creative Commons CC BY 4.0 License. Recent studies in social media spam and automation provide anecdotal argumentation of the rise of a new generation of spambots, so-called social spambots. Here, for the first time, we extensively study this novel ...