Taxonomy-based Supervised Topic Labeling

  • Ajwani D.
  • Nicholson P.
  • Sala A.
  • Taneva-Popova B.

The rise of AI assisted applications, bots-based customer support and personal assistants is imposing higher and higher accuracy requirements in interpreting and contextualizing text data. A fundamental building block in automated understanding of text is to assign a topic label to a text document, at an appropriate level of granularity. The topic label should generalize the entities in the document, but it shouldn't be too generic. The state-of-the-art solutions to this problem use unsupervised methods that either do not leverage the taxonomy structure or model the taxonomy as undirected graphs. The undirected paths mix the hypernym and hyponym edges arbitrarily and are, often, a poor indicator of semantic relatedness. As a result, the labels generated by these approaches struggle to achieve high accuracy. We propose novel directional traversal measures based on modeling the taxonomy as a directed acyclic graph. In addition, we leverage information-theoretic measures based on Mutual Information. We combine the power of our novel graph-theoretic and information-theoretic measures with existing measures (e.g., content-based) by using them as features in a supervised learning approach. Our evaluation on Amazon Mechanical Turk shows that the topic labels generated by our supervised method are significantly more accurate than the baseline state-of-the-art approaches from the literature, on a range of document corpora.

Recent Publications

August 09, 2017

A Cloud Native Approach to 5G Network Slicing

  • Francini A.
  • Miller R.
  • Sharma S.

5G networks will have to support a set of very diverse and often extreme requirements. Network slicing offers an effective way to unlock the full potential of 5G networks and meet those requirements on a shared network infrastructure. This paper presents a cloud native approach to network slicing. The cloud ...

August 01, 2017

Modeling and simulation of RSOA with a dual-electrode configuration

  • De Valicourt G.
  • Liu Z.
  • Violas M.
  • Wang H.
  • Wu Q.

Based on the physical model of a bulk reflective semiconductor optical amplifier (RSOA) used as a modulator in radio over fiber (RoF) links, the distributions of carrier density, signal photon density, and amplified spontaneous emission photon density are demonstrated. One of limits in the use of RSOA is the lower ...

July 12, 2017

PrivApprox: Privacy-Preserving Stream Analytics

  • Chen R.
  • Christof Fetzer
  • Le D.
  • Martin Beck
  • Pramod Bhatotia
  • Thorsten Strufe

How to preserve users' privacy while supporting high-utility analytics for low-latency stream processing? To answer this question: we describe the design, implementation and evaluation of PRIVAPPROX, a data analytics system for privacy-preserving stream processing. PRIVAPPROX provides three properties: (i) Privacy: zero-knowledge privacy (ezk) guarantees for users, a privacy bound tighter ...