How Do We Compare Performance Across Leagues?
How do we translate metrics across leagues? How can we fairly compare a player in the Eredivisie to one in the Premier League?
It's widely known how difficult it is to predict how a player would fare if he moved to another competition. Among many challenges in the football industry, this is one of the most impactful and interesting, as it affects how teams sign players during transfer windows.
A lot of factors could be considered in a model that tries to tackle this challenge, and some other factors are not easy or straightforward to capture, such as cultural fit, language barrier, and so on. So, we will focus on pure football skills.
At Gemini Sports, we developed a league translation methodology that underpins every metric available by default in the Gemini app, enabling users to compare players across competitions within their existing workflows. It uses player transfer data to objectively estimate how performances translate between leagues. But before we get there, we need to talk about the metric that powers it: Plus-Minus.
Why Plus-Minus?
Plus-Minus (PM) metrics are a staple in basketball analytics. On the simplest level, it is measured by recording point differential while a player is on the court. It aims to quantify individual contribution to overall team performance, without relying solely on individual stats that fail to capture all forms by which a player contributes to winning games.
However, in soccer, PM has been received with skepticism, and deservedly so. Soccer has limited substitutions, which increases the collinearity between players on the same team. It's also a low-scoring sport, which means good performances don't always translate to changes on the scoreboard. And because soccer is played at a high level across multiple countries, the importance of metrics normalized for different leagues becomes even greater.
So how do we adapt Plus-Minus to soccer? And how do we use it to compare players across leagues?
From Stints to Estimated Plus-Minus
The first step is splitting every game into stints: segments of play separated by substitutions, red cards, goals, or half-time. Each stint records the players on the pitch (encoded as 1 for home, -1 for away), the duration, and the xG differential. We use xG instead of actual goals because soccer is low-scoring, and xG captures the quality and frequency of scoring chances much more reliably.
Example of a stints table for a Barcelona vs Real Madrid match
With the stints table ready, the pipeline has three main steps:
The EPM pipeline: from RAPM through SPM and BPM to league-translated GPM
1. RAPM (Regularized Adjusted Plus-Minus): A Ridge Regression model is trained using xG differential as the target, encoded players and context features (gamestate, red cards) as predictors, and stint duration as sample weights. Ridge regression deals with the collinearity problem by penalizing extreme coefficients.
RAPM: Ridge Regression on stints data, producing per-player coefficients
2. SPM (Statistical Plus-Minus): Player-level aggregated stats (per 90, normalized by possession) are used as features to predict RAPM. This reduces the noise from collinearity further, but limits the predictions to effects captured by individual stats.
SPM: using player-level stats to predict RAPM values
3. BPM (Beyond Plus-Minus): To capture what SPM misses, we subtract the SPM-predicted xG differential from the actual xG differential, creating a "beyond xG diff" target. A new Ridge Regression on this residual captures contributions beyond what individual stats explain: the "invisible" impact.
Finally, EPM = SPM + BPM. The resulting metric captures both stats-based effects and contributions that extend beyond traditional statistics. Its unit of measurement is xG differential per 90 minutes. Multiply by 38 games (the standard season length in 20-team leagues) and you get Seasonal EPM: the expected goal differential a single player is responsible for over a full league season.
EPM = SPM + BPM, scaled to a full 38-game season
What is the league translation problem?
Here's where it gets interesting. EPM is computed within each league independently. So how do we compare a player scoring a Seasonal EPM of 5.0 in the Eredivisie to one with a Seasonal EPM of 3.0 in the Premier League? Without adjustment, we can't.
The key insight is that players who transfer between leagues provide natural experiments. If a player had a Seasonal EPM of 4.0 in the Eredivisie and then posts a 2.5 in his first Premier League season, that difference tells us something about the relative strength of the two competitions. Aggregate enough of these transfers and patterns start to emerge.
How does league translation work?
For each player who transferred between leagues, we keep only the seasons immediately before and after the transfer. This gives us a Transfers Table with:
How player transfers create the training data for league translation
- The origin league (encoded as
-1) and destination league (encoded as1) - The Seasonal EPM at both leagues
- The EPM differential (destination EPM minus origin EPM)
- The minimum minutes played across both seasons (used as sample weight)
Then, for each general position (Goalkeeper, Centre Back, Full Back, Defensive Midfielder, Centre Midfielder, Winger, Forward), a Ridge Regression is trained where the features are the one-hot encoded origin and destination leagues, the target is the Seasonal EPM differential, and the samples are weighted by minimum minutes played.
The resulting coefficients are League Coefficients. They don't directly represent league strength; rather, they have an interpretable unit of measurement: expected goals per season. This means we can directly translate a player's EPM from one league to another using simple arithmetic: take the actual EPM, add the origin league's coefficient, and subtract the destination league's coefficient. For example, to translate all players to the Premier League, we add the coefficient of their current league and subtract the Premier League coefficient.
Ridge Regression on transfer data produces league coefficients
How can we validate it?
One way to validate the model is by checking the correlation between the metric and market value at each stage of the pipeline. Market value isn't ground truth, but it captures the market's collective perception of player quality, which should be strongly informed by competitive level.
Spearman correlation with market value:
- RAPM → 0.165
- SPM → 0.323
- EPM → 0.323
- League Translated EPM → 0.581
- GPM (final metric) → 0.582
The jump from EPM (0.323) to League Translated EPM (0.581) is striking. League Translation nearly doubles the correlation with market value, making it the single most impactful step in the pipeline alongside SPM.
Visually, the translated distributions confirm what football insiders would expect: the Premier League emerges as distinctly superior; the remaining "Top Five" European leagues (La Liga, Serie A, Bundesliga, Ligue 1) cluster closely together; the Brazilian Serie A stands out as the strongest non-European league; and the Saudi Pro League's recent investments are reflected in its elevated position.
EPM distributions after league translation and scaling for top 12 competitions (2024/25 season)
What about VAEP?
The same idea applies beyond Plus-Minus. We also translate VAEP across leagues using a similar Ridge Regression approach on player transfers. The key differences are in the details: instead of EPM differential, the target is the difference in z-scored, padded, per-90 VAEP between the origin and destination seasons. The sample weights combine minimum minutes played with position-specific weights (wingers and centre midfielders carry the most signal, while goalkeepers are excluded entirely) and a time decay factor that gives more importance to recent transfers.
The result is a set of VAEP league coefficients that allow us to adjust on-ball value for league strength, just like we do with EPM. This means both the "visible" (VAEP) and the "invisible" (GPM) dimensions of player evaluation are league-adjusted, enabling fair comparisons across competitions on both fronts.
How do we handle outliers?
Leagues with unbalanced matchups tend to produce extreme values. Think of Crvena zvezda winning the Serbian regular season with an 84-goal differential while the runners-up Partizan managed 29. These extreme distributions distort comparisons.
To handle this, a quantile-based scaling transformation compresses extreme outliers (above the 95th percentile of standardized deviations) while preserving the rank ordering of everyone else. The effect is most pronounced in leagues dominated by one or two clubs: the Serbian Super Liga, the Portuguese Liga, the Uruguayan Primera Division, and so on.
Max z-scores before and after scaling: dominant-club leagues see the largest compression (2024/25 season)
The final metric, Seasonal EPM with league translation and scaling, is what we call Gemini Plus-Minus (GPM).
Why does it matter?
GPM captures contributions that are both impossible to detect using event data alone and prohibitively expensive to measure through tracking data. Unlike VAEP, which tends to reward players with a high volume of actions in the final third, GPM distributes value more evenly across all positions, capturing the defensive midfielder who organizes the press, the full-back who enables the team's build-up, and the centre-back whose positioning keeps the xG differential in check.
Both metrics can work together. By plotting VAEP against GPM for wingers in the Top Five leagues, players like Salah and Dembele show high on-ball value and high xG differential contribution. Others, like Savinho and Luis Diaz, have modest on-ball impact but contribute significantly to their team's overall xG differential through "invisible" contributions. And then there's Xavi Simons: high action value on the ball, but lacking matching impact on the team's overall performance.
Seasonal GPM vs VAEP for wingers in the Top Five leagues (2024/25 season)
GPM distributes value more evenly across positions than VAEP (2024/25 season)
In short: GPM provides a rigorous, interpretable, and league-adjusted framework for comparing players across the world. It won't tell you why a player is valuable, but it will tell you that he is, even when traditional stats don't.
How do we evaluate a player across multiple seasons? What is time decay and how does it affect evaluation? Find out in the next post.
P.S.: The work mentioned in this text was developed by Amod Sahasrabudhe, Gabriel Reis, João Lucas, Hugo Rios, Marc Garnica and myself. The research paper, "Translating Talent: A Cross-League Plus-Minus Approach in Soccer", was authored by Amod and Gabriel. You can find it on ResearchGate.
If you'd like to learn more about Gemini, feel free to schedule a call: Calendar