March Madness First Weekend Data Round Up

Which Dataset did the best?

In looking at over 107 data points, it once again becomes clear that no one piece of data works for all games or matchups. It was true last year, the year before, and the years before. However, below is a chart of the data points that did the best that I evaluated.

Personally, for the most part, I always start my evaluations with both KenPom and Sagarin. These two data sets are always a good starting point. In fact, I pay for the subscription to KenPom. It’s reasonable and fun to work with all season long. [Yeah, a free plug for both] After that, I use other metrics and some tools we’ve developed like the total adjusted efficiency and its differential.

As stated, a single data point is a terrible way to look at any game, but here are the data points that gave the best numbers. KenPom and 3 Pt Pct Shooting Differential were the two best “flat” predictors. However, as bad as we talk about the Committee and seeding, Committee Rank and Seed would have given you a 25-7 record for the 1st round. If you combined that with the 3 Pt Pct Differential in the second round, you’d end up with a respectable 490 points (an ESPN 99% rank). In a lot of office pools that would make you a clear favorite. Overall you would be going into the Sweet 16 with a 37-11 record and a good chance of being on top at the end.

DataESPN Round 1ESPN Round 2ESPN Total
3Pt A Diff180160620
Opp 3Pt M190200590
Assists Diff230160590
Opp A/TO170120570
FT Pct19080550
3Pt Pct Diff190240550
Rebs Per Game200200520
Kenpom230200510
3PtM Diff150120510
Pos Items/ Min Diff180160500
Assists Per Game210120490
Opp Assts170200490
NET240160480
Sagarin240160480
Assts200120480
Opp 3Pt Pct190240470
OppTotal Adj Eff170160450
Reb Diff170120450
TR Rating240120440
Sagarin Seed Prediction240120440
Pos Items/ Min200120440
Total Adj Eff Diff190160430
Pts/Min Diff180120420
RPI220120420
Pts Diff180120420
Total Pts Diff180120420
Seed250120410
Committee Ranking250120410
3Pt A170120410
Opp 2Pt Pct170160410
Off Reb Diff170120410
Total Eff Diff160120400
Total Eff190120390
Scoring Eff Diff190120390
Base Eff Diff15080390
Overall Pct140120380
FT+Reb Diff180120380
Opp Total Eff18080380
2Pt Pct Diff180120380
A/TO Diff18080380
Off Reb17080370
Opp Tot Adj Pts170200370
Opp Scoring Eff170200370
FGM Diff210120370
Total Adj Eff16080360
Opp FGM190160350

Yet when you look at each game set individually, the right set of stats could have led to being perfect after these first two rounds. Yeah, even little data nuggets pointed to the Princeton win. The problem is that so many data sets point in the other direction. So what’s good and what’s junk? Back several years ago we did a study of five years of in-depth tourney data. Then last year we had our data scientist look at some other data. That combination of data allowed us to win several pools last year and finish once again in the top 95% on ESPN. It’s still nowhere near as good as our best year, but it did give us some validation that we were on the right path. Unfortunately, this year, we tried a new model. It has not performed as well as last year although we have some brackets in the 94% group as of now. However, our max points possible still remains

But if you are wondering what data might have led you to a perfect bracket, the truth is that a well-honed set of data could have given you that perfect bracket (at least until the next round). At the end of this silly season, we will go back into our hole in the ground and sort it out once more. We will then do a deep dive verification of our data. Then we will publish our numbers. I would expect this to be done by June 1st.

Of course, there are some other bracket principles we adhere to here. The main thing to remember when making any bracket is that the first round cannot be evaluated as a group. You must take each of the matchups and apply the right data set to the right matchup. That has been true each and every year. As an example of note, the 7/10 games seem to come down to which team has come into the tournament doing a better job at taking care of the ball. Hence, TOs matter the most. But again, at the end of this season, we will parse out this data and compare it with our last 10 years. Then we are going to endeavor to recreate the last 15 years for an even better look at the data to come to better models. The future goal is to go back and recreate the last twenty years for analysis.

For now, we are going to use as our starting point a tried and true method of riding out the rest of the tournament. We are going to start our personal analysis with Seeding and Sagarin. Of course, a little Sangria might help too.

We will follow up that analysis with a chaser of our own spreadsheet analysis that compares teams more thoroughly. That spreadsheet should be available later today for download.

For now and as always, choose joy and continue to enjoy the madness.

Views: 2

Leave a Comment