Non-Intrusive and Efficient Detection of Latent Reliability Bottlenecks within Cloud Storage Services

  • Chen R.
  • Zhai E.

Large-scale storage systems, common in cloud computing, employ replication techniques to ensure reliability. Along with the increasing scale of modern cloud platforms, however, replica servers may inadvertently depend on deep, common infrastructure components, e.g., switches and DNS servers. Such unexpected common dependencies are defined as Latent Reliability Bottlenecks (or LRBs), which can result in correlated failures undermining the replication efforts. While there exist significant efforts in localizing faults after they occur, this paper proposes a novel system, SONDE, that offers non-intrusive and efficient LRBs detection before failures occur, by three steps: 1) automatically collecting service components and their dependency information, 2) constructing a fault tree model using this information, and 3) efficiently analyzing the fault tree to identify and rank LRBs based on their severity. SONDE is novel in its Step 1 and 3. In Step 1, SONDE's automatic dependency collection mechanism not only is accurate and efficient, but also does not need any human intervention or additional agent adoption. In Step 3, SONDE introduces a high-performance fault tree analysis engine by leveraging Z3 SMT solver, making LRBs analysis scalable to cloud-scale systems. We evaluate SONDE through detecting LRBs in a realistic storage service, and also based on large-scale datasets. For example, SONDE can detect 100% of the critical LRBs in a 70,656-node system, within 5 minutes.

Recent Publications

August 09, 2017

A Cloud Native Approach to 5G Network Slicing

  • Francini A.
  • Miller R.
  • Sharma S.

5G networks will have to support a set of very diverse and often extreme requirements. Network slicing offers an effective way to unlock the full potential of 5G networks and meet those requirements on a shared network infrastructure. This paper presents a cloud native approach to network slicing. The cloud ...

August 01, 2017

Modeling and simulation of RSOA with a dual-electrode configuration

  • De Valicourt G.
  • Liu Z.
  • Violas M.
  • Wang H.
  • Wu Q.

Based on the physical model of a bulk reflective semiconductor optical amplifier (RSOA) used as a modulator in radio over fiber (RoF) links, the distributions of carrier density, signal photon density, and amplified spontaneous emission photon density are demonstrated. One of limits in the use of RSOA is the lower ...

July 12, 2017

PrivApprox: Privacy-Preserving Stream Analytics

  • Chen R.
  • Christof Fetzer
  • Le D.
  • Martin Beck
  • Pramod Bhatotia
  • Thorsten Strufe

How to preserve users' privacy while supporting high-utility analytics for low-latency stream processing? To answer this question: we describe the design, implementation and evaluation of PRIVAPPROX, a data analytics system for privacy-preserving stream processing. PRIVAPPROX provides three properties: (i) Privacy: zero-knowledge privacy (ezk) guarantees for users, a privacy bound tighter ...