Peeking Through the BitTorrent Seedbox Hosting Ecosystem
In this paper, we propose a lightweight method for detecting and classifying BitTorrent content providers with a minimal amount of resources. While heavy methodologies are typically used (which require long term observation and data exchange with peers of the swarm and/or a semantic analysis of torrent websites), we instead argue that such complexity can be avoided by analyzing the correlations between peers and torrents. We apply our methodology to study over 50K torrents injected in ThePirateBay during one month, collecting more than 400K IPs addresses. Shortly, we find that exploiting the correlations not only enhances the classification accuracy keeping the technique lightweight (our methodology reliably identifies about 150 seedboxes), but also uncovers seeding behaviors that were not previously noticed (e.g., as multi-port and multi-host seeding). Finally, we correlate the popularity of seedbox hosting in our dataset to criteria (e.g., cost, storage space, Web popularity) that can bias the selection process of BitTorrent content providers.