Women’s Hockey Analytics Primer
Let’s talk about stats, baby
Recently, there’s been a number of breakthroughs in the world of women’s hockey analytics. So, we figured it was the perfect time to help familiarize our readers with some of the work that is out there now. We also thought this was a great opportunity to put together an analytics primer to help introduce some key concepts and terminology.
Mike Murphy’s RITSAC Presentation on Women’s Hockey Analytics
If this is your first time dipping your toe into this world, remember that it all looks scarier and more complicated than it actually is. Our goal with this primer is to introduce you to concepts and tools that are currently being used in the world of women’s hockey. We also highly recommend listening to Dmitri Filopovic’s Hockey PDOCast.
A Glossary of Terms and Concepts
A player is credited with a primary point when they score a goal (all goals are primary points) or if they’re the last player on the team to touch the puck before the goal scorer — also known as a primary assist. Primary assists are the first assist listed in any box score that summarizes a scoring event.
Why does this matter? Typically, primary assists — often abbreviated to “A1” — are more important to the creation of a goal than secondary assists and have more of a direct influence on scoring. It is because of this that we view primary assists as an indicator of players who excel at generating offense and creating goals. For example, if a defender has a high number of primary assists, we can safely assume that she plays a featured role in her team’s offense.
Further Reading: “Expected Primary Points are a better predictor of future scoring than Shots, Points.” | “1st and 2nd Assist Bias.”
Corsi and Fenwick
Corsi is the sum of a player or team’s shots on goal, missed shots on goal, and blocked shot attempts towards the opposition’s net, minus the same shot attempts directed at your own team’s net. So, it’s shot differential, but it counts shot attempts instead of just counting the shots that go into the net and the ones that are stopped by a goaltender. And if you’ve heard the term “Fenwick” before, it’s simply Corsi without the blocked shots — the logic here being that blocked shots are a skill.
We often talk about Corsi as Corsi For Percentage (CF%). CF% is the percentage of shot attempts a player, line combination, defense pair, or team is on the ice for compared to the opposition. So, a team that finished a game with a 53 CF% would have had a significant edge in what we call possession since they took 53 percent of the shot share.
Why is Corsi so important? Without playing and puck tracking, Corsi is a useful way to measure possession since players have to have possession in order to shoot the puck. It’s also been proven to be one of the best tools we have to predict future performance; it’s far more reliable than counting stats like primary points and goal differential. A team that consistently takes a high percentage of the shot share is the one that generates more offense, which is bound to translate into scoring. On the other hand, teams that are consistently buried in shots are bound to struggle over time.
- Further Reading: “Stats Made Simple Part 1: Corsi & Fenwick.” | “FQG: Update on Predictive Relationships.”/
Rate Statistics i.e. Per 60 Stats
Per 60 stats are a way to scale rate statistics to reduce factors such as ice time, time missed with injury, and the number of games that a player or team appeared in that went into overtime. It levels the playing field when we compare players and teams against each other.
Per 60 simply means how many events an individual player or team accumulated per 60 minutes of play. It can also be used when we use rate statistics to look at the events that players were on the ice for, including things possession stats and goal differential.
Rate statistics are particularly useful when we’re evaluating players and teams over longer periods of time. One consideration to make when using per 60 statistics is sample size, since a player’s numbers may be skewed when in a smaller sample. If a skater’s statistics are encouraging, at the very least, it may be worth seeing if they can maintain that over a longer span of games. On the other hand, it may magnify a player that had a lackluster performance in that given sample.
- Further Reading: “An advanced stat primer: Understanding basic hockey metrics.”/
PDO doesn’t stand for anything (yes, we know that’s confusing), but it helps us measure luck for individual players and for teams. When a player is scoring far more or less than we expected them to, it’s usually a good idea to check their PDO to see what might be going on.
In order to calculate a team’s PDO we combine a team’s shooting percentage with their save percentage, with the understanding that most teams should ultimately regress towards the sum of 100. We can do the same for individual players when we combine their shooting percentage with the on-ice save percentage of their goaltender. PDO is most commonly applied to 5v5 play.
- Further Reading: “PDO: If you were going to understand just one NHL statistic.” “Outperforming PDO: Mirages and Oases in the NHL.”/
As defined by Hockey-Reference.com, Point Shares for are an estimate of the number of points contributed by a player. In this case points refers to team points, not a player’s individual points. It provides an estimation for how much an individual contributed toward a team’s total points for that season.
Within point shares there is also Offensive and Defensive Point Shares, which look at the number of points a player contributed at a particular end of the ice. The idea of course is for a player to contribute at both ends, however the distribution of a players point share is almost always going to differ based on what type of player they are (offensive defender, shutdown forward, etc).
- Further Reading: “Calculating Point Shares.” /
High, Medium, and Low Danger
We’ve all heard and read the phrase, “a high-danger scoring chance”, but what the heck does that actually mean?
Emmanuel Perry of Corsica.hockey established specific zones for scoring chances that correlated to the NHL’s average shooting percentage. Naturally, shots taken from the slot and/or close to the net have a much better chance of going in than shots taken from bad angles or from the point. Makes sense right? Now, let’s take a lot at Perry’s heat map to get a visualization of where those specific zones are.
Low Danger: less than 3.0%
Medium Danger: Less than 9.0% and equal or greater than 3.0%
High Danger: Equal or greater than 9.0%
Establishing these three danger zones has proven to be a fantastic tool for the evaluation of teams, skaters, and goaltenders. It provides context for a player or team’s ability to create high-quality scoring chances and for a player or team’s ability to limit those chances for the opposition.
- Further Reading: “Video Breakdown: How Steve Valiquette will change how we think about goaltending.” | “Predicting Save Percentage: Danger Zones and Shot Volume.”/
Expected Goals, or xG as they are more commonly known, are almost like a counter-attack to the notion that stats like Corsi perpetuate the idea that the team with the most shots should win a hockey game. The common criticism from stats experts and rookies alike being “but what if one team’s shots are of a better quality.” That grey area is where xG comes in.
xG uses data that includes not just the distance from the net that a shot was taken, but also what angle it was taken at. They use algorithms similar to the one that made the scoring chance visualization above, as well as looking at things like shot type to estimate the probability of any one shot actually managing to hit twine.
So ‘good things happen when you shoot,’ but also ‘good things are more likely to happen when you shoot from optimal areas.’
- Further Reading: “Shot Quality and Expected Goals.” | “Expected Goals are a better predictor of future scoring than Corsi, Goals.” | “New Expected Goals Model for Predicting Goals in the NHL.”/
Zone starts are often presented by a percentage, and, like most stats, they are most useful when we look at them in the right context. A skater who has a high percentage of defensive zone starts — which means that they are on the ice for faceoffs in their defensive zone — is clearly trusted by her coach. When we see a skater who has a high percentage of offensive zone starts and a low percentage of defensive zone starts, we can speculate that they might be “sheltered” by their coach.
However, the analysis of the NHL zone start data that we have suggests that zone starts rarely, if ever, impact the average player’s productivity or performance, especially since they only account for some of a player’s deployment, since on the fly shifts are not accounted for. Now, that doesn’t mean that zone starts are worthless. They still help us identify the roles that skaters play on their teams and make us aware of coaching tendencies.
- Further Reading: “Beware of what zone starts are telling you Part 1.” | “Overemphasizing Context – A mistake just as poor as explaining context in the first place.” | How much do zone starts matter part:1 (Maybe) not as much as we thought.”/
Quality of Competition
Quality of Competition (aka QoC) generally measures the Corsi ratings of a player’s opponents. QoC is another major part of the context we need to consider when evaluating a player’s performance — along with zone starts and Quality of Teammates. How often does a forward get matched up against the opposition’s best defensive defenders? How might that impact the number of shots, scoring chances, and points they produce? These are some of the questions that QoC help us investigate.
- Further Reading: “Just How Important is Quality of Competition? Very. Also, not much. It’s All Relative.” | WoodMoney: “A new way to figure out quality of competition in order to analyze NHL data.” | “How much does matching competition matter on a team level?”/
When we talk about player’s game on any given night, one of the biggest topics is almost always ice time. “How long were they on for?”, “How many shifts did they take?” ,“Did you see they played almost 40 minutes last night?” And in a game where the ‘on field’ team is constantly changing, it’s not a bad conversation to have with some players (read: your superstars) naturally getting more time than your third line defensive pairing.
So what about leagues where TOI isn’t tracked and/or isn’t available publicly? That’s where eTOI (also known as Estimated Time On Ice) comes in. eTOI looks to calculate how long a player was on the ice based on their contribution to events. Formulas for this vary exponentially between leagues based on what metrics are readily available for their particular situation. So, unfortunately, eTOI is not always as accurate as we want it to be, but it is far better than nothing.
- Further Reading: “Estimating Ice Time.”/
Note: Evaluating goaltenders and predicting their performance is among the trickiest things to do in sports analytics. Why? Because, well, goalies are weird.
Goals saved above average (GSAA) is the number of saves that a goaltender makes above or below the league average. GSAA helps put a goaltender’s performance in context with their peers, both at 5v5 play and in all situations. GSAA is measured on a scale of 0, so if a goalie has a 0 GSAA, her GSAA is flush with the league’s average.
GSAA/30 is the rate version of GSAA, it tells us how many goals saved above the league average a goalie has per every 30 shots she has faced. This helps level playing the field a bit more when we compare goaltenders who are in different situations, particularly when it comes to the number of shots their teams allow.
- Further Reading: “GSAA: An Essential Statistic for Evaluating Goaltenders.” | “Goalies are Voodoo...But Improving Comparative Analysis Tools Can Help.”/
We all know that it’s not fair to judge a goaltender by their win-loss record. That’s why Rob Vollman created the Quality Start stat, which might sound familiar to you if you’re a fan of baseball or softball. The idea behind this stat is determining whether or not a goaltender “gave their team a chance to win.”
A quality start is awarded when a goalie has a start that is above the league save percentage, or when they allow two or fewer goals and record a save percentage above the league’s replacement level save percentage. In other words, it tells us when a goalie matched or exceeded the average performance of a goaltender in a game.
- Further Reading: “Are Quality Starts a Repeatable Skill?” /
Tools and Resources
- @quarkyhockey | NWHL Event Locations | Mapped with Jake Flancer’s pbp file.
- @TheShawnFerris | An Introduction to NWHL Game Score | A tool to measure the general performance of individual players. Most useful for the evaluation of forwards.
- @alyssastweeting | NWHL Projected Goals 2019 | Alyssa Longmuir’s goal projections for the 2018-19 NWHL season.
- @alyssastweeting | NWHL Two-Player Comparison Tool | Alyssa’s tool for comparing NWHL skaters across seasons.
- @CreaseGiants | Goalie Data | Goaltending stats for the NWHL, CWHL,
- @jeff_craig_ | CWHL Tracker | Jeff has CWHL stats from the 2017-18 season.
- Shayna Goldman’s Tableau Profile | Featuring viz for NWHL and Olympic data.
- Murphy | Pyeongchang Olympics Stat Sheet
- Murphy | 2017-18 NWHL Stat Sheet
- Murphy | 2017-18 CWHL Stat Sheet
- Elite Prospects | A site with basic stats and player bios for various women’s leagues, including the Olympics, World Championship, professional European leagues, the NWHL, the CWHL, and NCAA DI and DIII.
- Hockey East Online | A great database with counting stats for NCAA D1 hockey, going back to the 2012-13 season.
- NWHL.zone/statistics | The NWHL’s stats hub.
- theCWHL.com/stats/player-stats | The CWHL’s stats hub.
- MetaHockey | A repository of hockey analytics research, publications, and resources./