January 26, 2025-4 min read

Bayesian Shrinkage and Padding Applied to Low-Sample Performances

How do we deal with low-sample performances? How much should we expect a player's performance to change as he gets more minutes?

Football AnalyticsData Science

An interesting question I was recently asked by Rodrigo Picchioni, Lead Data Scout at Monaco, inspired me to write and share this post. I’m also curious about the opinions of others in the industry on this challenge.

"How much should we expect a player's performance to change as he gets more minutes? How does that affect the metrics?"

This question highlights a fundamental challenge in sports analytics. While there are multiple ways to approach it, I want to share how we deal with low-sample performances.

At Gemini, we treat low-sample performances as inherently uncertain. What does that mean, and why does it matter?

Our goal is to automatically adjust player statistics when a player hasn't played much, so we can avoid overvaluing small flashes of brilliance that might just be luck.

Concretely, our approach is to apply statistical padding (also known as shrinkage). By regressing toward the mean, we blend the player signal with a baseline/prior.

But what exactly is this baseline prior, and what does “regressing toward the mean” mean in practice?

In Bayesian terms, a prior is a probability distribution p(θ) over an unknown parameter θ. In our scenario, we do not explicitly model the full prior distribution; instead, we summarize it through a baseline value estimated or assumed before observing much of the player’s data.

Here, θ represents a measure of performance. For the sake of example, let's define performance as the well-known VAEP metric. In other words, θ can be interpreted as the player's true VAEP per 90.

The prior mean represents our best guess for θ before observing much data. It could be the league-wide VAEP average, position-specific VAEP average or a conservative estimate, such as positional average minus 1.5 standard deviations.

The prior strength, which controls how quickly this belief is overridden, corresponds to what we can think of as a minute-threshold: a denominator term that determines how much weight the baseline (prior mean) receives. However, rather than applying a universal minutes threshold, we can calculate appropriate minimums based on competition and position, since different positions and competitions have different variance patterns.

Given these definitions, our implementation is best described as an empirical Bayes–style shrinkage. We do not store a full p(θ) distribution. Instead, we apply a deterministic weighted average that behaves like a Bayesian posterior mean. That is:

The padded (shrunk) estimate is computed as a weighted average between the player's observed performance and a baseline prior:

padded = w × playerValue + (1 - w) × baseline

where:

padded = final shrunk estimate (it matches the form of a Bayesian posterior mean)
w = n / (n + k) = weight assigned to the player's data
n = player minutes
k = prior strength (minutes threshold)
playerValue = (Σ(value) / Σ(minutes)) * 90
baseline = prior mean, i.e. our best guess for θ (e.g. league average, position average, etc.)

The prior strength k can be chosen heuristically or derived from the data.

The key intuition is that as n increases, w approaches 1, and the padded estimate converges to the player's observed performance. Conversely, with limited minutes, the baseline carries more weight.

Of course, it is important to consider that, regardless of padding, meaningful changes will still happen because:

True performance may be changing (adaptation, tactical role, etc.), and
Even at, say, 1500 minutes, some actions, especially rare events, still have impactful variance.

In short:

With few minutes/actions: the player signal is more uncertain → higher baseline weight → more regression to the mean.
As minutes/actions accumulate: the player signal becomes more reliable → the player weight increases (baseline weight decreases) → less regression to the mean.

This approach improves player evaluation methodologies by going beyond simplistic thresholds and providing nuanced, mathematically sound adjustments for limited playing time. Thus, the Statistical Padding methodology powers key analytical use cases:

Talent Identification: Helps distinguish between genuine talent and statistical noise
Player Comparison: Enables fair comparison between players with different playing time
League Translation: Forms the foundation for reliable cross-league performance comparison
Time-Decay Analysis: Provides more stable inputs for longitudinal player evaluation

What is Time Decay? How do we translate metrics across leagues? Subjects of upcoming posts...

P.S.: The work mentioned in this text was developed by Amod Sahasrabudhe, Gabriel Reis, João Lucas, Hugo Rios, Marc Garnica and myself.

If you’d like to learn more about Gemini, feel free to schedule a call: Calendar