Scalable Stream Caching

Abstract

In the current Internet, web content is increasingly being cached closer to the end-user to reduce network and web server load and improve performance. Existing web caching systems \cite{CommercialCaches} typically cache entire web documents and attempt to keep them consistent with the origin server. This approach works well for text and images; for bandwidth intensive multimedia data such as audio and video, caching entire documents is not cost effective and does not scale. An alternative approach is to cache parts of the multimedia stream on different caches in the network and coordinate stream playback from these independent caches. From the perspective of the clients, the collection of cooperating distributed caches act as a single fault-tolerant, scalable cache. In this paper, we focus on data placement and replacement techniques for such cooperating distributed caches. Specifically, we propose the following new schemes that work together: (1) A family of distributed layouts, consisting of two layouts, namely RCache and Silo. The RCache layout is a simple, randomized, easy-to-implement layout that distributes constant length segments of a clip among caches and provides modest storage efficiency. The Silo scheme improves upon RCache; it accounts for long term clip popularity and intra-clip segment popularity metrics and provides parameters to tune storage efficiency, server load, and playback switch-overs; (2) Two novel local data replacement schemes, namely alpha-beta, and Rainbow. The alpha-beta scheme uses simple thresholds to capture the macroscopic clip popularity and microscopic segment popularity. The Rainbow is more sophisticated and uses the concept of segment access potential that accurately captures the popularity metrics. (3) Caching Token, a dynamic global data replacement or redistribution scheme that exploits existing data in distributed caches to minimize distribution overhead. Our schemes optimize storage space, start-up latency, server load, network bandwidth usage, and overhead from playback switchovers. Our analytical and simulation results show that the Silo scheme provides 3 - 7 times as high cache hit ratio as that of traditional web caching system while utilizing the same amount of storage space.