Taxonomy-based Supervised Topic Labeling

  • Ajwani D.
  • Nicholson P.
  • Sala A.
  • Taneva-Popova B.

The rise of AI assisted applications, bots-based customer support and personal assistants is imposing higher and higher accuracy requirements in interpreting and contextualizing text data. A fundamental building block in automated understanding of text is to assign a topic label to a text document, at an appropriate level of granularity. The topic label should generalize the entities in the document, but it shouldn't be too generic. The state-of-the-art solutions to this problem use unsupervised methods that either do not leverage the taxonomy structure or model the taxonomy as undirected graphs. The undirected paths mix the hypernym and hyponym edges arbitrarily and are, often, a poor indicator of semantic relatedness. As a result, the labels generated by these approaches struggle to achieve high accuracy. We propose novel directional traversal measures based on modeling the taxonomy as a directed acyclic graph. In addition, we leverage information-theoretic measures based on Mutual Information. We combine the power of our novel graph-theoretic and information-theoretic measures with existing measures (e.g., content-based) by using them as features in a supervised learning approach. Our evaluation on Amazon Mechanical Turk shows that the topic labels generated by our supervised method are significantly more accurate than the baseline state-of-the-art approaches from the literature, on a range of document corpora.

Recent Publications

January 01, 2018

Fair Dynamic Spectrum Management for QRD-Based Precoding with User Encoding Ordering in Downstream G.fast Transmission

In next generation DSL networks such as G.fast, employing discrete multi-tone transmission in high frequencies up to 212 MHz, the crosstalk among lines reaches very high levels. To precompensate the crosstalk in downstream transmission, QRD-based precoding has been proposed as a near-optimal dynamic spectrum management (DSM) technique. However, the performance ...

January 01, 2018

Practical Mitigation of Passive Intermodulation in Microstrip Circuits

This paper presents new experimental evidence and a novel practical approach for mitigation of passive intermodulation (PIM) in microstrip circuits fabricated on commercial printed circuit board laminates. The mechanisms of distributed PIM in microstrip circuits are reviewed and a phenomenology of PIM generation due to locally enhanced electromagnetic fields at ...

January 01, 2018

Efficient Cooperative HARQ for Multi-Source Multi-Relay Wireless Networks

In this paper, we compare the performance of three different cooperative Hybrid Automatic Repeat reQuest (HARQ) protocols for slow-fading half-duplex orthogonal multiple access multiple relay channel. Channel State Information (CSI) is available at the receiving side of each link only. Time Division Multiplexing is assumed, where each orthogonal transmission occurs ...