The Marriage of Approximate Computing and Privacy-Preserving Data Analytics

  • Chen R.
  • Christof Fetzer
  • Do Le Quoc
  • Martin Beck
  • Pramod Bhatotia
  • Thorsten Strufe
  • Volker Markl

Online advertisement is a major economic force for modern online services, where users' private data is continuously collected for real-time data analytics. In the current advertisement ecosystem, the goals of users and data analysts are at odds: users' seek stronger privacy, while analysts strive for high-utility data analytics in near real time. In this paper, we target to design a pragmatic privacy-preserving data analytics system that resolves this tension. More specifically, we answer the following research question: How to preserve users' privacy while supporting high-utility data analytics for low-latency stream processing? Our design builds on the marriage of two existing computing paradigms: privacy-preserving data analytics and approximate computing. Privacy-preserving data analytics techniques, such as differential privacy, produce noisy output to protect individual user's privacy. Approximate computation returns an approximate output instead of the exact output to achieve low-latency execution (and also efficient utilization of computing resources). We make the observation that these two computing paradigms are complementary, and can be married together! Both computing paradigms strive for approximation, but they differ in their means for computing the approximate output. Privacy-preserving analytics adds explicit noise to the final aggregate query output. Whereas, approximate computing relies on representative sampling of the entire dataset to compute over only a subset of data items. To realize this marriage, we designed an online sampling algorithm that achieves zero-knowledge privacy to produce an approximate output with bounded error in real-time. We implemented our algorithm in a data analytics system called PrivApprox based on Apache Flink Streaming. Our evaluation using micro-benchmarks and real-world case-studies shows that PrivApprox achieves the benefits of both approximate computing and privacy-preserving data analytics.

Recent Publications

August 09, 2017

A Cloud Native Approach to 5G Network Slicing

  • Francini A.
  • Miller R.
  • Sharma S.

5G networks will have to support a set of very diverse and often extreme requirements. Network slicing offers an effective way to unlock the full potential of 5G networks and meet those requirements on a shared network infrastructure. This paper presents a cloud native approach to network slicing. The cloud ...

August 01, 2017

Modeling and simulation of RSOA with a dual-electrode configuration

  • De Valicourt G.
  • Liu Z.
  • Violas M.
  • Wang H.
  • Wu Q.

Based on the physical model of a bulk reflective semiconductor optical amplifier (RSOA) used as a modulator in radio over fiber (RoF) links, the distributions of carrier density, signal photon density, and amplified spontaneous emission photon density are demonstrated. One of limits in the use of RSOA is the lower ...

July 12, 2017

PrivApprox: Privacy-Preserving Stream Analytics

  • Chen R.
  • Christof Fetzer
  • Le D.
  • Martin Beck
  • Pramod Bhatotia
  • Thorsten Strufe

How to preserve users' privacy while supporting high-utility analytics for low-latency stream processing? To answer this question: we describe the design, implementation and evaluation of PRIVAPPROX, a data analytics system for privacy-preserving stream processing. PRIVAPPROX provides three properties: (i) Privacy: zero-knowledge privacy (ezk) guarantees for users, a privacy bound tighter ...