Bayesian Nonparametrics for Marathon Modeling

  • Perez-Cruz F.
  • Pradier M.
  • Ruiz F.

Bayesian nonparametric (BNP) models present several desirable properties. Their most recognizable benefit is the ability of avoiding the need to specify a closed model and letting the data choose the model (or models) that describe it best, in order to provide competitive predictions [6]. In this paper, we focus instead on their generative property, that allows explaining the data in an amenable way, allowing us to make hypotheses and extract conclusions of the obtained results. This ability facilitates collaboration with experts in other fields, avoiding the frequent black-box flavor found in other methods. Some examples of descriptive analyses can be found in psychiatry [15], genetics [19], topic modeling [7], image segmentation [9], speaker diarization [8] or tracking [11]. BNP models are extremely flexible and they can also provide accurate predictions with a structure that is not necessarily interpretable. If we want both insightful descriptive conclusions and accurate predictions, we need to specify the prior in a way that points towards the sought explanation (we need to add our prior information to the model). In this way, the first insight should not be foreign to us. This makes the model trustworthy for experts in other fields, so other conclusions that were not common knowledge can be taken as plausible. At this stage we are able to formulate hypotheses that can be tested with future data and can provide previously unknown insights about the given problem. In a way, most BNP models are described as general priors [17, 12, 18, 10] that are applicable for a large number of problems. We believe that BNP will be useful for non-machine-learning experts if we can constraint the priors to provide accurate and interpretable solutions. In this paper, we present a novel application of BNPs to model marathon runners. We aim at analyzing the data from different perspectives, in order to find hidden properties of the athletes, while providing accurate predictions. We resort to a nonparametric model instead of a parametric one to leave room for the unexpected. We build a model to fairly compare the finishing time of runners for different ages and sex. This has several applications. First, there are marathons that award entry to participants by their best marathon in the previous 12 months.1 The entry requirements vary considerably for one event to the next, as there is no widely accepted standard method to specify these requirements. Second, the World Master Athletics (WMA) has an age-graded system [3] for equalizing the finishing time according to age and sex. They lobby for this measure to be taken into consideration for selecting the winners for each race. However, this procedure is based on world records, or in other words, top record outliers, which might not be very representative, or even realistic.

Recent Publications

August 09, 2017

A Cloud Native Approach to 5G Network Slicing

  • Francini A.
  • Miller R.
  • Sharma S.

5G networks will have to support a set of very diverse and often extreme requirements. Network slicing offers an effective way to unlock the full potential of 5G networks and meet those requirements on a shared network infrastructure. This paper presents a cloud native approach to network slicing. The cloud ...

August 01, 2017

Modeling and simulation of RSOA with a dual-electrode configuration

  • De Valicourt G.
  • Liu Z.
  • Violas M.
  • Wang H.
  • Wu Q.

Based on the physical model of a bulk reflective semiconductor optical amplifier (RSOA) used as a modulator in radio over fiber (RoF) links, the distributions of carrier density, signal photon density, and amplified spontaneous emission photon density are demonstrated. One of limits in the use of RSOA is the lower ...

July 12, 2017

PrivApprox: Privacy-Preserving Stream Analytics

  • Chen R.
  • Christof Fetzer
  • Le D.
  • Martin Beck
  • Pramod Bhatotia
  • Thorsten Strufe

How to preserve users' privacy while supporting high-utility analytics for low-latency stream processing? To answer this question: we describe the design, implementation and evaluation of PRIVAPPROX, a data analytics system for privacy-preserving stream processing. PRIVAPPROX provides three properties: (i) Privacy: zero-knowledge privacy (ezk) guarantees for users, a privacy bound tighter ...