March 13, 2017

Analysis and Diagnosis of SLA Violations in a Production SaaS Cloud

  • Ganesan R.
  • Iyer R.
  • Kalbarczyk Z.
  • Martino C.
  • Sarkar S.

A software-as-a-service (SaaS) needs to provide its intended service as per its stated service-level agreements (SLAs). While SLA violations in a SaaS platform have been reported, not much work has been done to empirically characterize failures of SaaS. In this paper, we study SLA violations of a production SaaS platform, diagnose the causes, unearth several critical failure modes, and then, suggest various solution approaches to increase the availability of the platform as perceived by the end user. Our approach combines field failure data analysis (FFDA) and fault injection. Our study is based on 283 days of operational logs of the platform. During this time, the platform received business workload from 42 customers spread over 22 countries. We have first developed a set of home-grown FFDA tools to analyze the log, and second implemented a fault injector to automatically inject several runtime errors in the application code written in .NET/C#, and then, collate the injection results. We summarize our finding as: first, system failures have caused 93% of all SLA violations; second, our fault injector has been able to recreate a few cases of bursts of SLA violations that could not be diagnosed from the logs; and third, the fault injection mechanism could recreate several error propagation paths leading to data corruptions that the failure data analysis could not reveal. Finally, the paper presents some system-level implication of this study and how the joint use of fault injection and log analysis may help in improving the reliability of the measured platform.

View Original Article

Recent Publications

May 22, 2017

Multidimensional Resource Allocation in NFV Cloud

  • Goldstein M.
  • Raz D.
  • Segall I.

Network Function Virtualization (NFV) is a new networking paradigm in which network functionality is implemented on top of virtual infrastructure deployed over COTS servers. One of the main motivations for the shift of telco operators into this paradigm is cost reduction, thus the efficient use of resources is an important ...

May 20, 2017

The Actual Cost of Software Switching for NFV Chaining

  • Caggiani Luizelli M.
  • Raz D.
  • Saar Y.
  • Yallouz J.

Network Function Virtualization (NFV) is a novel paradigm allowing flexible and scalable implementation of network services on cloud infrastructure. An important enabler for the NFV paradigm is software switching, which needs to satisfy rigid network requirements such as high throughput and low latency. Despite recent research activities in the field ...

May 08, 2017

Coexistence-aware dynamic channel allocation for 3.5 GHz shared spectrum systems

The paradigm of shared spectrum allows secondary devices to opportunistically access spectrum bands underutilized by primary owners. As the first step, the FCC targeted sharing the 3.5 GHz (3550–3700 MHz) federal spectrum with commercial systems. The proposed rules require a Spectrum Access System to implement a three-tiered spectrum management framework, ...