What are the best metrics for NFL Quarterbacks

August 28th, 2022 / QBs /

Measuring NFL Quarterbacks

While football is the quintessential team sport, no individual player is more important in the NFL than the quarterback. Quarterbacks are involved in nearly every offensive play and are directly responsible for a significant portion of an offense's production. Unsurprisingly, there's a strong desire to better quantify the position through the use of different stats and metrics.

These approaches range from simple volume stats like passing yards, to more complex composites like Adjusted Yards Per Attempt (AY/A), to advanced stats like expected points added (EPA). The expansive set of stats and metrics available for measuring a quarterback's play leaves us with a simple question -- what are the best metrics for NFL QBs?

In this post, we'll take a look at the following stats:

  • Yards per attempt - How many yards a QB gains per pass attempt.

  • Completion Percentage (Comp %) - The number of completed passes per pass attempt.

  • Completion Percentage Over Expectation (CPOE) - An advanced stat that adjusts Completion percentage for contexts like depth of target.

  • Touchdown % - How many touchdowns passes a QB throws per pass attempt.

  • Interception % - How many interceptions a QB throws per pass attempt.

  • TD% - INT% - The net difference between TD% and INT%.

  • Passer Rating - A composite metric that looks at QB efficiency over yards, completions, touchdowns and interceptions.

  • Adjusted Net Yards per attempt (ANY/A) - a composite metric similar to passer rating, but with additional consideration for sacks.

  • Pro Football Focus' Passer Grade (PFF Grade) - a metric derived from film analysis and play charting.

  • ESPN's Quarterback Rating (QBR) - A custom formula that seeks to better attribute team performance and win probability to the quarterback.

  • 538 QB Elo - a composite metric meant to mimic QBR, but with a much simpler formula.

  • Expected Points Added (EPA) - An advanced stat that uses game context and play-by-play data to determine how much a given play play helped a team generate points.

Specifically, we'll look at how well they describe game outcomes, and how stable they are from one season to the next to determine which passing game metrics are the most useful.

How Passing Metrics Align to Winning Games

In statistics, correlation is often used to measure how closely related two sets of numbers are. If two sets of numbers are closely related, they'll move together and result in a strong correlation. R Squared (RSQ) is a very similar stat to correlation, but instead measures how well one variable predicts another. For instance, if pass attempts has a high RSQ to touchdown passes, we might say that knowing how many pass attempts a quarterback had allows us to predict how many touchdown passes they likely threw.

For our analysis, we'll measure each passing metric's RSQ relative to Margin of Victory (MoV). You might be asking yourself, if we're trying to measure how well a metric aligns to winning games, why use MoV instead of a team's win loss record? In the NFL, wins and losses can be swung by individual plays. In a tight game, a fourth quarter dropped pass can be the difference between winning and losing. With MoV, the more a team wins by, the less likely it is that any individual play or randomness would have made the difference had it gone the other way. Perhaps counterintuitively, MoV is a better measure of a team's ability to win than their actual winning percentage.

Looking at the RSQ of our QB metrics, we see that EPA aligns most closely to wins, followed, by the seemingly less advanced metrics ANY/A and Passer Rating:

best-qb-metrics-rsq-to-margin-of-victory

EPA aligns best with winning

Expected Points Added measures how a quarterback's play contributed to scoring points, so it's not all that surprising that it does the best job predicting MoV, and by extension, wins. The higher the QB's EPA, the more points the team was likely to score (even if those points really shouldn't have been fully attributed to the QB).

Less advanced stats still do well

On one hand, it makes sense that both ANY/A and Passer Rating have high RSQ's. They both measure overall production and efficiency, which should help them correlate with points scored. But how are they outperforming more advanced metrics like QBR or PFF Grades?

When a team is winning they become more conservative. They're less likely to attempt deep passes or create explosive plays, which, in turn, are the types of plays that are more likely to result in interceptions and sacks. Since ANY/A and Passer Rating penalize these types of mistakes heavily, QBs that happen to be winning further benefit statistically from play style rather than performance. This isn't to say that ANY/A and Passer Rating are misleadingly correlated to winning percentage, but rather, that we should discount them slightly.

ESPN's QBR

Outside of EPA, we see that ESPN's QBR is the next best performing amongst the advanced stats. QBR is based on ESPN's own internal EPA model, so again, we should expect it to have a high RSQ. It's RSQ is likely not as high as straight EPA because it also takes into account the performance of the QB's supporting cast. If a running back breaks a tackle and turns a simple screen into an explosive play, EPA will give full credit to the QB, while QBR will not. So, while EPA aligns most closely to what happened in the score, QBR may do a better job determining the role of quarterback play in those events. Again, RSQ is just one way to measure a metric and we can't discount the additional nuance at play.

538's QB Elo

QB Elo is a composite regression that uses simple stats like passing yards, passing touchdowns, interceptions, and even a QB's rushing touchdowns to simulate QBR. It can't take into consideration supporting cast the way QBR can, but it appears to do a fairly good job at predicting wins and losses, beating out other advanced stats like PFF Grades and CPOE. QB Elo has sometimes been criticized for overweighting the rushing element of QB play, but, as it turns out, QB rushing ability is still a very important part of the game.

PFF Grades

Some may look at the RSQ of PFF Passer Grades and question how valuable a qualitative assessment of film is towards measuring performance. This would be a mistake. Winning is not a QB stat on its own. As a metric does a better job isolating quarterback performance, it will increasingly align less and less to the final score. PFF grades can give the QB positive credit for dropped passes even if those plays made their team less likely to win. Like with QBR, this better isolating of QB performance hurts the metric's RSQ relative to MoV even though it likely improves the fundamental quality of the metric at the same time.

Simple Stats

It goes without saying that simple stats like completion percentage, YPA, TD%, and INT% do a fairly poor job at predicting margin of victory. While these metrics are fairly popular, they really only look at one element of a quarterback's performance, leaving fans and analysts to determine how best to combine them into a more holistic view of QB play. And if simple stats are only valuable when combined together, why not use a composite like Passer Rating or ANY/A that already does that work for you? 

The Seasonal Reliability of QB Metrics

In the analysis above, we looked at how well various metrics described winning and losing by comparing their RSQs relative to margin of victory. While an ability to predict margin of victory is useful, it's only half of the equation.

Football is a highly random sport. Injuries, flukey plays, mismatched gameplays, and blown calls all have the potential to create misalignment between true ability and the final score. It's why we can see a team earn a playoff berth one year only to finish last in their division the following season.

For a metric to be considered a true measure of a QB's ability, it should remain fairly stable over time. While game-to-game fluctuations will exist, over a large enough sample, players a metric considers high quality in one season should, on average, be considered high quality in another. Using the same RSQ framework, we can assess metric stability by comparing a player's performance in one year to their performance in the following season.

For instance, in 2020, Aaron Rodgers posted a league best 0.362 EPA / Play on his way to an NFL MVP. To measure the season level stability of EPA, we would compare that datapoint to his EPA / Play in 2021, which happened to be 0.251. While these two numbers don't align perfectly, that 0.251 EPA / Play in 2021 was still league best and earned Aaron Rodgers another NFL MVP award. In this specific example, EPA looks rather stable -- the best quarterback in 2020 was the best quarterback in 2021.

To measure a metric's stability, we make a similar comparison, but for all quarterbacks in the league, looking at how well their performance in one season predicted their performance in the next season. The measure of this predictiveness is, again, RSQ.

RSQ Scatter Plot

To visualize metric performance, each metric has been plotted in the scatter below. The x-axis represents the metric's ability to predict an individual game outcome (further to the right is better), and the y-axis represents the metric's ability to predict a player's performance in that metric from one year to the next (further up is better):

best-qb-metrics-scatterplot

Inverse Relationship Between Predictiveness and Stability

Setting aside the simple statistics that don't offer much predictive power, we see an inverse relationship between how well a metric predicts an individual game outcome and how well it predicts future performance -- why is this?

As mentioned previously, the more a metric is able to parse out the individual performance of a quarterback the less it's able to describe the game outcome because the game outcome is determined by much more than quarterback performance alone. Metrics like PFF Grade, QB Elo, and CPOE, are substantially better at predicting season over season performance at the expense of being able to predict individual games.

The Descriptive / Predictive Tradeoff

The clear tradeoff between isolating individual performance and identifying metrics that result in wins makes it difficult to determine the best metric for measuring quarterback performance. On one hand, we want to measure the performance of the quarterback and the quarterback alone, but on the other, we also need to make sure that what we're calling performance is actually results in wins.

Said another way, the best quarterback metric is the one that balances the descriptive/predictive tradeoff most efficiently. In my mind, this is EPA. EPA is the best predictor of margin of victory, while also being more stable than other game-predicting metrics like Passer Rating, ANY/A, and QBR.

One potential workaround to the tradeoff problem would be to measure a metric's seasonal predictiveness not against itself, but rather to the next season's average margin of victory. This approach combines stability and ability to predict wins into a single number. Interestingly, all the metrics we care about -- EPA, ANY/A, Passer Rating, QBR, QB Elo, and PFF Grades -- have roughly the same RSQ:

best-qb-metrics-season-stability

So What's the Best Metric?

Regardless of how the data is cut, no one metric stands out as a clear winner. Furthermore, the line between advanced stats like EPA, QBR, and PFF grades and simpler composites like Passer Rating, ANY/A, and QB Elo is surprisingly narrow, especially for Passer Rating, a 50 year old metric that predicts future margin of victory better than the machine learning driven EPA.

Ultimately, statistical measurement of QB play is a work in progress that requires understanding of context and an appreciation for uncertainty. If we want to know which QB is worthy of the MVP, we might want to consider EPA most heavily as it aligns closest to actual wins. If instead we want to know which QB is most likely to be successful in the future, we might consider PFF Grades, which do a better job at isolating individual play and predicting future success.

But perhaps above all else, we should avoid the temptation to dogmatically crown one metric over another. As for myself, I plan to use Passer Rating a bit more just to prove a point.