clock menu more-arrow no yes mobile

Filed under:

Pitt Basketball: Regression, Reversion, and Sample Size

Applying some popular terms associated with advanced metrics to this past season.

Amber Searls-USA TODAY Sports

There is no denying that the use of advanced metrics continues to grow at a rapid pace. In the very near future we'll be talking about offensive efficiency as much as we talk about points per game. We may already be there. But like any tool, advanced metrics, and the terms often associated while using them, are only useful when applied/used correctly. When it comes to NCAA basketball, using the same base assumptions as the NBA is a mistake.

For starters, the NBA schedule is 82 games, and if you factor in the NBA playoffs, an NBA player can participate in a maximum of 110 games in one year. Collegiate players are lucky to play more than 30+ games in a given season, and might never log a total of 110 games in four years. That doesn't even account for the fact that an NBA game is eight minutes longer than an NCAA contest.

The "game" is mostly the same. However, when it comes to making statistically driven inferences, these variables must be accounted for in order to make accurate assessments. Let's take a look at a few examples from last season.

Telling someone that junior point guard Josh Newkirk's three-point percentages regressed this past season certainly sounds fancier than saying he struggled. By its very definition, Newkirk certainly regressed to a lesser developed state. Although, we can’t use regression and reversion interchangeably. It’s certainly true that Newkirk didn’t continue to build on the success he found from beyond the arc as a freshman. But given that he attempted 37.7% more triples as a sophomore, you can’t really say he reverted to his collegiate mean.

That brushes on another popular term that is thrown around quite a bit these days – sample size. I'd argue that his 126 total three-point attempts over two seasons and 70 games still isn't a large enough sample (this could be an entirely separate piece) to generate a definitive answer. But therein lies the problem: How does one assess a reserve point guard that has already used half of their collegiate eligibility, statistically speaking?

The opposite is true for junior forward Jamel Artis in terms of his three-point percentages. Is the 39.4% guy we saw last season the real him, or is it the player who went 8-27 as a freshman? In this case, it's much more reasonable to think Artis is simply improving on this facet of his game. Still, we're talking about even less of a total sample than Newkirk.

We should all be able to agree that while Newkirk regressed last season, there wasn’t any reversion to the mean, as I think his "mean" from three-point distance was unreasonably small. A better assessment of his ability from beyond the arc, based solely on numbers, would be to make a more customized view. For instance, I'd wash out all games in which he attempted just one triple. In my opinion, it's wouldn't be uncommon for even a really good shooter to miss on their only attempt, especially a player that isn't billed as being a specialist in that area.

As a freshman, Newkirk had 11 games in which he took only one three-point attempt; he was 2-11 in those games. That means he was a whopping 21-42 otherwise. This past season, Newkirk again had 11 games in which he attempted just one triple; he was 2-11 again. Throw those games out and he'd have been 20-62 beyond the arc last season. His freshman season, he had games of 5-5, 3-3, and two games of 2-2 to help bolster his percentages. As a sophomore, he had two games of 1-5 and two games of 0-4 from deep, which had the opposite effect.

In truth, Newkirk resembles Speedy Claxton more than he does Aaron Brooks. Based on my customized view, to say he can be relied upon to be a knockdown player from deep in limited opportunities is a stretch. The 4-22 sample, albeit small, lends to that. It's much more accurate to say that he’s streaky and capable of having some really strong performances from three-point land if he’s in a rhythm.

That analysis just confirms the eye test, and really, that's what advanced metrics should be used for. Again, players can both improve and regress, but it's important to understand the difference between regression and reversion. Along with that, sample size is something that must be accounted for, but within proper context. When we're talking about collegiate athletes, it must be understood that the life cycle of a player is a finite limit to it.

Be sure to join Cardiac Hill's Facebook page and follow us on Twitter @PittPantherBlog for our regular updates on Pitt athletics. Follow the author @Stephen_Gertz