Over Expected Metrics Explained -- What are CPOE, RYOE, and YACOE

July 29th, 2022 / CPOE // RYOE // YACOE // Models /

"Over Expected" stats like CPOE and RYOE are a new class of metric gaining adoption across the NFL analytics community. At their core, these metrics hope to paint a more accurate picture of performance by adjusting familiar statistics like Completion Percentage or Yards per Rush for conflating factors like degree of difficulty or game context. As Over Expected metrics increase in popularity, it's important to develop a baseline understanding of how they work and, more critically, where they might be limited or less effective.

In this post, we'll first explore how Over Expected metrics are calculated using CPOE as an example. With a baseline understanding in place, we'll then look at the relative effectiveness of other Over Expected metrics like RYOE and YACOE to explore how and when these types of metrics break down.

An Over Expected example with CPOE

CPOE stands for Completion Percentage Over Expected, and as the name suggests, it measures how much higher (or lower) a QB’s completion percentage is relative to what we’d expect it to be based on the types of passes they attempted.

Every Over Expected metric starts with an expectation. Without a notion of what should have happened, we can’t put what did happen into context. As you might expect from an advanced analytics metric, this expectation is set by a machine learning model.

For CPOE specifically, our expectation is set by a model that predicts how likely a pass is to be completed (xCP) based on a variety of factors like the depth of the throw and its location relative to the sideline. Though many other factors are incorporated into this model, a general heuristic for thinking about xCP is that the shorter and closer the throw is to the QB, the higher it’s expected completion percentage will be:

nfl-cpoe-example

With a model in place, Over Expected metrics then look at what actually happened on the play to determine how much better or worse a particular player performed. For CPOE, this is a measure of whether or not the pass was completed. If a pass was completed, the actual completion percentage on that attempt would be 100%, and if it wasn’t, the actual completion percentage would be 0%. CPOE is then simply the difference between the observed result (CP) and the model’s expectation (xCP):

nfl-cpoe-calculation-example

To aggregate an Over Expected metric across a time period like a game, or a season, or even a career, all the individual plays are simply averaged together:

average-cpoe-calculation-example

If we take a step back and think about what’s actually being measured here, we see that CPOE, and all other Over Expected metrics, are really just errors in a model. Our model starts with a prediction (xCP), we observe something different (CP), and we call the difference a metric (CPOE).

Here’s where things can start to get a little dicey for Over Expected metrics. If the underlying expectation model (in this case xCP) was accurate, wouldn’t we expect all Over Expected metrics to come out to about 0? Afterall, what good is a model if it can’t predict the thing it’s supposed to predict? 

Of course, when we look at career CPOE performance, we do in fact see that good QBs tend to have high CPOE (aka positive model error) while bad QBs tend to have low CPOE (aka negative model error):

career-cpoe

So what’s going on here?

While the xCP model does its best to accurately predict the probability of a completed pass, it’s doing so with one hand tied behind its back. One of the most critical variables -- who’s throwing the ball -- is intentionally omitted, making the xCP model less accurate than it could be.

By omitting the QB as a variable in the model, we’re effectively saying that the model’s error is created by the QB. Put another way, if our model cant discern between QBs and instead makes predictions based off of a league-wide average, then good QBs should create positive errors versus our model while bad QBs should produce negative errors.

Over Expected metrics are a clever hack of model construction that can work well in practice, but there’s a catch -- the omitted player, in this case a QB, is just one of many potential drivers of model error. How do we know that an Over Expected metric is measuring who is good and not just who got lucky?

Measuring Stability

For an Over Expected metric to be a true measurement of a player’s skill independent of context, the underlying model must be reliably inaccurate, which is to say, good players must consistently perform better than expected, while bad players must consistently perform worse than expected. 

One approach to measuring the consistency of a metric is to see how well it correlates to itself from one period to another. Going back to our CPOE example, if a player's past CPOE predicts their future CPOE, then we can be a bit more confident that CPOE is largely measuring player skill and not some other variable omitted from the xCP model.

Using RSQ as our measure of correlation, we see that a QB’s CPOE does, in fact, correlate strongly across different periods:

nfl-qb-cpoe-rsq

In the world of NFL analytics, a year-over-year RSQ of 0.226 is about as good as it gets. It means that 22.6% of CPOE variations across players in one season are explained by their CPOE variations in the previous season -- past performance explains future performance. Further, as we extend our observation window, and additional randomness is smoothed out, past CPOE becomes even more predictive of future CPOE. From this, we can be fairly confident that CPOE does effectively measure some elements of QB performance.

That’s great for CPOE, but what about the other Over Expectation metrics being used to measure player performance? Let’s take a look.

YACOE

YACOE stands for “Yards After Catch Over Expected,” and just like CPOE, it measures a player’s ability to generate more yards after the catch than what we might expect a league average player to do.

Comparing the RSQ of YACOE to the RSQ of CPOE, we see that CPOE is far more predictive for QBs:

nfl-qb-cpoe-and-yacoe

Perhaps this is to be expected. While a QB can set their receiver up for YAC with a good read and an accurate throw, the actual generation of YAC might be more a function of who’s catching the ball. Sure enough, when we aggregate YACOE at the receiver level instead of the passer level, it becomes more stable:

nfl-wr-cpoe-and-yacoe

There are two very important points to take away from this table:

First, Over Expected metrics can’t be applied to any position we want or to just one position alone. In the case of YACOE, it seems to be the receiver, not the QB who’s driving most of the performance, so measuring a QBs YACOE probably tells us more about the receivers they were throwing to than it does anything about the QB’s own performance. That said, we’re trying to split model error between two different variables (QB and WR) that were both omitted from the model. YACOE may belong predominantly to the receiver, but we can’t say it’s the WR’s alone.

Second, different Over Expected metrics require different amounts of data to be effective. For a receiver’s YACOE, we need two seasons of data to reach the same level of predictiveness as just one season of QB CPOE. As mentioned above, Over Expected metrics are just model errors. Random model error will smooth out overtime with more data, leaving just the non-random error created by the player’s performance. Since not all Over Expected metrics isolate signal with the same effectiveness, less effective metrics like YACOE need bigger samples to tell us what we actually want to know.

RYOE

In this particular regard, Rushing Yards Over Expected, or RYOE, isn’t the best performer. RYOE uses field position, defensive alignment, and player tracking data (depending on provider) to determine how many more yards a rusher gained than we’d expect an average rusher to gain in the same situation.

We know that defenses like to load boxes and run commit against good rushers, so it would seem that no position needs a context correcting Over Expected metric more than running backs. Unfortunately, RYOE may not fully get us there:

nfl-rb-ryoe-rsq

RYOE has almost no predictiveness from one season to the next. Meaning, a rusher with a positive RYOE in one season is essentially a coinflip for a negative RYOE in the next. Only after 4 seasons of data (and serious selection bias) do we reach the same level of signal as CPOE and YACOE.

While RYOE may be the best tool we have for measuring career RB performance, what does it really mean to lead the league in RYOE for one season? To be clear, this isn’t saying the underlying expected rushing yard model is inaccurate. Rather, it’s saying that when there are deviations from the model, we don’t know how to attribute them. Unless we believe the skill of individual RBs radically changes year-to-year, a single season of RYOE doesn’t tell us any more about true rushing ability than something simpler like yards per attempt.

Though this detail may seem minor, it’s actually quite important to how we view Over Expected metrics as an evaluative framework.

Closing

Over Expected metrics are supposed to help us separate player skill from player circumstance. Sometimes they do this well (ie CPOE) and sometimes they can be a bit more limited (ie YACOE and RYOE). In cases where Over Expected metrics only tell us who had the most production relative to expectation, and not who was the most skillful, we have to ask ourselves whether or not the metric is really better at its intended purpose than something simpler. Even for players with significant sample size across seasons, RYOE and Yards Per Attempt have a correlation north of 70%. What do we gain from this added complexity and what do we lose?

In addition to the increased complexity, we must also grapple with the, dare I say, subjective nature of these models. While there’s only one way to divide rushing yards by rushing attempts, there are many ways to calculate something like RYOE. PFF has a version created with pre-snap player alignment data, the NFL has a version created with player tracking data, 4for4 has a version created using play-by-play data, and there’s an ever expanding long-tail of models developed by the analytics community more broadly. If we’re to replace a metric that any fan can calculate and understand on their own with a metric that requires proprietary data and a machine learning model, shouldn’t we be getting more in return?

New metrics, especially those that come from fresh perspectives, should be celebrated, but let’s do so while recognizing advanced metrics are just one way to look at the game and, in many ways, are still a work in progress.