Identifying user activity from truncated-URL web traces

  • Ajwani D.
  • Mai T.
  • Sala A.

Understanding user behavior is essential to personalize and enrich a user's online experience. While there are significant benefits to be accrued from the pursuit of personalized services based on a fine-grained behavioral analysis, care must be taken to address user privacy concerns. The ideal data set used for such an analysis should not reveal any sensitive user information, while still maintaining necessary information for infering user activities (such as reading, social networking, gaming, searching etc.). In this paper, we consider the use of web traces with truncated URLs -- each URL is trimmed to only contain the web domain -- for this purpose. While such truncation removes the fine-grained sensitive information (e.g., search query, purchased products, location etc.), we show that this data set is still good enough to identify user activities with high accuracy. Specifically, we focus on the problem of finding representative URLs from the truncated-URL web traces, that will characterize and identify user activities. Toward this goal, we propose a statistical methodology that segregates the representative URLs from the remaining traffic with high accuracy. Our methodology is able to achieve this accuracy by leveraging specialized features extracted from a data burst (where a burst is defined as a group of consecutive URLs that represent a micro user action like web click, chat reply, etc.). These bursts, in turn, are detected by a novel algorithm which is based on our observed characteristics of the inter-arrival time of HTTP records.

Recent Publications

August 09, 2017

A Cloud Native Approach to 5G Network Slicing

  • Francini A.
  • Miller R.
  • Sharma S.

5G networks will have to support a set of very diverse and often extreme requirements. Network slicing offers an effective way to unlock the full potential of 5G networks and meet those requirements on a shared network infrastructure. This paper presents a cloud native approach to network slicing. The cloud ...

August 01, 2017

Modeling and simulation of RSOA with a dual-electrode configuration

  • De Valicourt G.
  • Liu Z.
  • Violas M.
  • Wang H.
  • Wu Q.

Based on the physical model of a bulk reflective semiconductor optical amplifier (RSOA) used as a modulator in radio over fiber (RoF) links, the distributions of carrier density, signal photon density, and amplified spontaneous emission photon density are demonstrated. One of limits in the use of RSOA is the lower ...

July 12, 2017

PrivApprox: Privacy-Preserving Stream Analytics

  • Chen R.
  • Christof Fetzer
  • Le D.
  • Martin Beck
  • Pramod Bhatotia
  • Thorsten Strufe

How to preserve users' privacy while supporting high-utility analytics for low-latency stream processing? To answer this question: we describe the design, implementation and evaluation of PRIVAPPROX, a data analytics system for privacy-preserving stream processing. PRIVAPPROX provides three properties: (i) Privacy: zero-knowledge privacy (ezk) guarantees for users, a privacy bound tighter ...