Skip to content

Sports by the Numbers

Author: Peter Lemieux

Technical Appendix: The Model for Seedings with Champions Included

Posted on March 10, 2017March 10, 2017 by Peter Lemieux

This table presents the results of three “Tobit” estimations of the effects of RPI and conference membership on seedings in the NCAA Tournament.

The first column reports the results for a model that includes each teams RPI, its conference membership, and “interaction” terms for each combination of RPI and conference so that the slopes can differ across conferences.  RPI has a strong negative relationship with seeding, and that relationship is steeper for mid-major teams, and steeper still for majors.

The second column tests the hypothesis that the slope for majors and mid-majors are the same.  Here I include a variable that measures RPI for all major and mid-major teams together, then include a separate measure for the major conference teams.  If the majors and mid-majors followed the same path, the term for majors only in model (2) should be zero.  Since it is not, I maintain the distinction between majors and mid-majors in model (3).

This is the model where I add in whether a team won a conference championship. Champions from single-bid conferences actually have lower seeds because they are included in the Tournament automatically.  Since many of these are among the weaker teams, a tournament winner from a conference like the Colonial or Ivy receives a worse seeding than an average at-large team at the same RPI.

When we turn our attention to the mid-majors and majors, though, conference champions receive a substantial seeding bonus.  Mid-major champions have better (lower) seedings by a factor of 1.4.  For major champions the effect is a whopping 2.6 ranks.  Given the usual denigration of the conference championships by basketball pundits, these are surprisingly large effects indeed.

 

Posted in NCAA Men's Basketball, Technical Notes

Shot Clock Effects Redux

Posted on January 24, 2017January 24, 2017 by Peter Lemieux

Last year I posted two items concerning the effects of the change to a thirty-second shot clock in NCAA mens’ college basketball.  I found that total scoring had increased by nearly twelve points per game between the 2014-2015 season and the 2015-2016 season after the shot-clock rule was changed.  However the margin of victory was unaffected.  An equally dramatic effect was seen for three-point shooting.  Teams were hoisting nearly two more three-point shots per game, probably because the shorter clock meant more “desperation” threes were being taken.  However I found no change in the accuracy of three-point shooting after the clock was shortened.

Scoring in the current 2016-2017 season differs hardly at all from last season.  All three measures show insignificant gains compared to last year.

This table extends the results for three-point shooting to include all games played though January 20th of this year.

Three-point attempts have continued to rise in the 2016-2017 season, but we also see an improvement in three-point accuracy.  Teams are shooting three more three-pointers every four games than they did last season, and their accuracy has improved by about half a percentage point.

This change might represent improvements in players’ abilities over time, or a conscious decision by coaches to recruit better three-point shooters out of high schools.  However it may also simply be random fluctuation.  If we go back to the data for 2008-2009, the earliest year available at the NCAA’s site, accuracy was 34.7 percent, hardly different from this year’s figure.  Attempts in 2008-2009 were still significantly lower at 18.9 per game.

Posted in NCAA Men's Basketball

Home Field Advantage in NFL Playoffs since 2010

Posted on January 14, 2017December 18, 2017 by Peter Lemieux

Updated: December 18, 2017

Last year I undertook an analysis of home field advantage in the NFL playoffs but only the wild-card games had been played when I published those results.  I’ve now included all the playoffs in the 2016-2017 NFL season and added some other findings.  The basic conclusions I reached a year ago have remained unchanged.

Overall the home team has won about two-thirds of these games by an average margin of just under six points.

Because the teams are seeded in the playoffs we should expect home teams to outperform their opponents.  The differences across the types of playoff games show the value of these higher seedings.  The margin is smallest in the “wild-card” games, since the teams in those games are more closely matched.  (The top-two teams in each Conference receive a bye in the first-round, so the wild-card games pair the three versus the six seeds, and the four versus the five.)  In the later rounds when the top seeds play, the home team’s advantage is larger, running about seven to eight points compared to three in the wild-card games.

Some of the home field bonus can be attributed to the fact that higher-seeded home teams are stronger, while some may reflect the “home-field advantage.”  For comparison, I calculated the average score for home and away teams for the 41 games played during weeks seven through nine of the 2017 season.  Home teams scored an average of 22.8 points, 2.9 more than their opponents, and won 61 percent of their games.  These figures are quite consistent with the results for wild-card teams presented above.  Seedings play a greater role in the later playoff rounds.

Posted in NFL

Handicapping the NFL Divisional Playoffs: 2017

Posted on January 14, 2017January 14, 2017 by Peter Lemieux

Two years back I estimated the point spreads in the four divisional NFL playoff games using a simple model of each team’s average margin or victory over the season based on these factors:

  • net yardage from offensive and defensive plays;
  • net sacks per game; and,
  • net turnovers per game.

I have since added the effects of three other factors to my model for point spreads:

  • net yards gained or lost during kickoff returns;
  • net yards gained or lost during punt returns; and,
  • net yards gained or lost due to penalities;

I have estimated the effects of these factors on each team’s average margin of victory using seasonal data for all 32 NFL teams between 2013 and 2016.

This table presents the results for the remaining eight teams in the playoffs.  The top half of the table shows the net difference between the home and away teams on each of our six factors.  For instance, Houston gained an average of 13.4 more yards per game compared to its opponents; for New England, the comparable figure is just 0.2.  That gives New England a net deficit in yardage of 0.2 – 13.4, or -13.2 as reported below.  The other figures in the top half of the table are similarly calculated.

Using my model, I can estimate the individual effects on the margin of victory (“point spread”) for each of these six factors. For instance, the effect for yardage is approximately 0.08 points per net yard gained, so the Texans’ 13.2 yard advantage on the ground compared to the Patriots is worth about 0.08 X 13.2, or 1.06 points, rounded to 1.0 in the table.  The team with the greatest advantage in terms of yardage is Dallas, which gained on net almost 28 yards more per game against its opponents than did Green Bay.  That difference is worth about 2.2 more points for the Cowboys in tomorrow’s game against the Packers.

When we add up the various effects of each of these six factors, Dallas has the greatest predicted advantage at slightly over three points compared to the Packers.  Next comes New England, whose advantage stems largely from creating more sacks and turnovers than do the Texans.  The Atlanta Falcons hold a slight advantage over the Seattle Seahawks, while the Kansas City Chiefs are predicted to lose to the Pittsburgh Steelers in their game Sunday night.

The last column on the right of the table shows the betting lines in Las Vegas for each game.  These are comparable to my predicted point spreads.  The 16 point spread for the Patriots over the Texans is outrageously high both by historical standards and by my estimates.  Atlanta is also more favored by bettors than the teams’ 2016 performances would justify.  And, despite my model’s prediction that Kansas City should lose to the Steelers in Arrowhead, bettors prefer the home team by a slight margin.

Posted in NFL

Bracketology 2016: Predicted Seedings

Posted on February 24, 2016February 24, 2016 by Peter Lemieux

Last year I published a simple model of NCAA Men’s Tournament seedings based on RPI and conference membership.  To recap, higher RPI teams received better seedings, and teams representing major and “mid-major” conferences got better seeds than teams from the other conferences even if they had identical RPI scores.  In principle we should see no differences between conferences once RPI is taken into account because the measure relies heavily of a team’s strength of schedule.  Teams in stronger conferences should have higher RPI scores because they face a more difficult schedule.

In practice, though, the NCAA Selection Committee clearly prefers teams from major and mid-major conferences and fails to give teams from other conferences a fair shake when it comes to seedings as this chart shows:

seedings

A team with an RPI of 0.600 from a “single-bid” conference like the Colonial or the Ivy League is predicted to be seeded tenth or eleventh, while schools with identical RPI figures from the mid-major and major conferences would receive a six or a seven seed.  The advantage for both those conferences over single-bid schools grows as RPI increases, as does the advantage for major-conference teams over mid-majors.

We can use the model I estimated that underpins this chart to estimate how teams will be seeded in 2016 based on their current RPI scores.  Using RPI figures from CBS Sports through Sunday, February 21st, gives us the following predictions for the 36 teams that will make up the at-large field in this year’s Tournament:seeding-model-simulationBoth Louisville and SMU are ineligible for Tournament play in 2016, so Seton Hall and Wisconsin have a chance to slip in at the bottom of the rankings.

Posted in NCAA Men's Basketball

Effects of the Shot-Clock Change on Three-Point Shooting

Posted on January 26, 2016February 23, 2016 by Peter Lemieux

The newly-accelerated speed of play in NCAA Men’s College Basketball may have had some side effects other than a simple increase in tempo and higher scores.  The faster pace may make teams change the way the play the game itself.  One place we might see such a change is in three-point shooting.  Teams often resort to hoisting a “desperation three” if their half-court offense has bogged down and the horn on the shot clock is about to sound.

I’ve compiled the statistics for three-point attempts and three-point shooting percentage for the complete 2013-2014 and 2014-2015 seasons from the NCAA’s archive. This season’s figures represent those same data through games of January 25, 2016.  Including 2013-2014 enables us to compare any change this season to “normal” seasonal change before the shot clock was shortened. Here are the results for three-point attempts:

shot-clock-3pt-multi

With the shorter clock, teams have been averaging a smidgen over twenty three-point attempts per game this season, about one and a half more than in 2014-2015.  Three-point attempts grew between 2015 and 2014 as well, but the rise in 2016 is some 3.5 times greater than the increase between 2015 and 2014.  Even if we deduct the 0.42 growth in attempts between 2014 and 2015 from this year’s total, that still leaves an additional 1.5 three-point attempts per game since the clock was shortened.  “Desperation” three-point shots probably account for a lot of this growth.

shot-clock-3ptpc

All these extra three-point shots have not affected accuracy. Teams shot 34.3 percent from outside the arc in 2014-15 and are shooting a statistically identical 34.6 percent now.  More striking is the sharp decline from the rate of 36.1 percent in 2013-2014.  While three-point accuracy rebounded slightly this season, it still remains statistically below 2013-2014.

 

 

 

 

Posted in NCAA Men's Basketball

Effects of the Shot-Clock Change in Men’s College Basketball

Posted on January 25, 2016February 23, 2016 by Peter Lemieux

Most basketball teams play with a “shot clock” that limits the amount of time that either team can spend holding the ball.  In professional men’s basketball the clock runs for 24 seconds.  Both professional and collegiate women use a 30-second clock.

Until this season collegiate men had the luxury of a 35-second clock, considerably longer than that used in the professional ranks to which many of these players aspire.  Now the men have joined their female peers and play on a 30-second clock.  Has the faster pace of play affected scoring and, if so, how?

shot-clock-scoring

These figures come from games played during two equivalent weekends in 2016 and 2015.  Most teams were playing conference opponents so the level of competition is roughly the same. This year’s large snowfall in the Mid-Atlantic states produced a few cancellations so the number of games is slightly smaller for 2016.

Scoring has increased nearly twelve points per game this year.  Winners score about 6.5 points more per game in 2016, while losers score an additional 5.0 points. All of these differences are well beyond standard criteria for “statistical significance.”

The margin of victory also grew by 1.5 points, but that difference doesn’t pass statistical muster. There is no statistical evidence that the faster clock has increased the margin of victory.

Reducing the clock from 35 to 30 seconds constitutes a 14 percent reduction (5/35) in time of possession.  Scoring, on the other hand, has increased by only 8.6 percent in response (11.5/133.8).

The shorter shot clock has increased the pace of play as well.  Using the enormous archives of collegiate basketball statistics available to subscribers at Ken Pomeroy’s kenpom.com, I averaged his measures of “tempo” and “efficiency” for the 351 Division I teams in his database.  These figures are based on his estimates of the number of possessions per game using a formula explained here.  I compared the entire season figure for 2015 with those for games played through Sunday, January 24th of 2016.

shot-clock-pomeroy

“Tempo,” the number of possessions per forty-minute game, has increased drastically since 2015, rising well over four per game.  That alone might account for the increase in scoring, but it is not the only factor.  Teams are also scoring about one point more per hundred possessions this year than last.  So not only do teams have more possessions with a shorter clock, the faster pace appears to make those possessions slightly more productive as well.

Obviously this change will wreak havoc on historical comparisons to the 35-second era.  Identically-skilled players in 2016 should be scoring on average about nine percent more compared to the men who played in years past.

 

 

Posted in NCAA Men's Basketball

NFL Mid-Season Report Card

Posted on November 10, 2015November 10, 2015 by Peter Lemieux

With the conclusion of week nine of the NFL season we have reached the midpoint of the 2015 regular season.  All teams have played in at least eight of their sixteen games with four teams having played nine since they have not had the bye week.  I decided to replicate last season’s analysis of the 2014 season to develop a “report card” for this year’s lineup of NFL team performances.

The 2014 model included three predictors of a team’s margin of victory: net yards gained per game, net turnovers per game, and net sacks per game.  The first two of these also help explain variations in a team’s margin of victory in 2015, but I find no separate effect so far for sacks.  Perhaps that is just because they are relatively rare events which makes their effect harder to measure across just eight or nine games.  I also included a new factor this year, net yards gained or lost due to penalties.  It turns out that penalty yardage has an effect on scoring that persists even after a team’s total yards gained or lost is factored into the equation.  The effect is about the same for yardage due to penalties as net yards from scrimmage, about seven or eight points for every hundred yard a teams outgains its opponent.

Including those factors lets us rank the 32 teams by how well their actual performance on the field compares to the performance my little model predicts.  At the top of the list is the Baltimore Ravens.  They have scored an average of three points fewer than their opponents, but that is almost five points more than their statistics would predict.  The next five overachievers all outscore their opponents by a margin or three or four points more than we would expect given yardage, turnovers, sacks, and penalties.

report-card-week9

Two teams stand out as dramatic underachievers, the Tampa Bay Buccaneers and the Chicago Bears.  Both teams score an average of six points fewer than their statistics would predict  They are followed by the Chargers, Jets and Texans.  Of those three only the Jets are predicted to outscore their opponents by a substantial margin, nine points per game, but they have managed to win by an average margin of just under five.

 

 

Posted in NFL

Does Spending Bring Victories in Major League Baseball?

Posted on September 6, 2015October 16, 2015 by Peter Lemieux

Over the past five years the Boston Red Sox spent $587 million dollars on player salaries.1 In 2013 all that money help buy them a World Series, but in 2012 and 2014 the big-spending Red Sox ended the season in the cellar of the American League East.  They fell to the bottom of the division again this year on Memorial Day and only now, late in the season, do they have a chance to climb back out.  (Update: The Red Sox “surge” in September fizzled out leaving them once again in the cellar of the AL East.)

Over that same five-year period the Dodgers spent even more, $824 million, and the team’s investment has bought them a winning record but little else. The team advanced to the Championship round in 2013 with a victory over the Atlanta Braves, but lost to the St. Louis Cardinals.  Last year the Cards foiled the Dodgers chances by beating them in the first round, and this year they fell to the Mets in the five-game divisional series.

The New York Yankees have seen no better rewards than the Dodgers when it comes to their high-priced payroll.  The Yanks also spent about $800 million on player salaries between 2011 and 2015 and averaged one more win than the Dodgers.  Like the Dodgers, though, the Yankees’ playoff performance over these years has also been pretty dismal.  They lost to the Detroit Tigers in the divisional round in 2011, and though they got past the Baltimore Orioles in that round a year later, they fell to the Tigers once more when playing for the pennant.  This year they lost to the Houston Astros in the one-game wild-card playoff.

When fans see teams spend enormous amounts in player salaries and get such mediocre results, they rightly wonder whether highly-touted expensive players are really worth the investment.  These doubts only grow stronger when they see teams with considerably smaller payrolls like Kansas City, Oakland, Pittsburgh or St. Louis routinely make the playoffs and sometimes win the Series.

As it turns out, though, focusing on these particular high-spending teams leads us to the wrong conclusion.  We can see the general relationship between payrolls and victories by simply plotting each team’s total wins against its total spending on player salaries.  Here is the plot for all thirty MLB teams from 2011 to 2015:2

wins-vs-player-salaries

Teams that spend more on player salaries do win more games, but the price is pretty steep.  The “slope” of the line, 0.13, tells us that it takes about $8 million to improve an average team’s performance by one regular-season win, since 0.13 X 8 = 1.04.

We can also use this overall relationship between salaries and winning to see whether any teams do especially better or worse compared to what the model predicts for them.  For the 30 teams in Major League Baseball, I find just nine whose performance over the past five years deviated “significantly” from the model’s predictions:

team-performances

One striking result is that only the Cardinals do significantly better than we would predict based on player salaries alone, averaging another ten games per season.  No other team in baseball shows the savvy of the St. Louis front office in terms of staffing its clubhouse economically.  On the contrary, what stands out are the many more teams that significantly underperformed given their payrolls.

At the bottom of the list are the formerly hapless Houston Astros. In 2011, 2012, and 2013, the team spent about $80 million, $40 million, and just $11 million on player salaries.  According to the model those figures should have generated between 70 and 79 wins. The Astros managed just 56, 55, and 51 victories in those years. Last year Houston improved to 70 wins, and this year Houston beat the Yankees in the wild-card game before falling to the Kansas City Royals in the divisional series.

The other teams in the chart probably won’t come as a great surprise to anyone who follows baseball.  The most poorly-served fans are our friends from Chicago where both the Cubs and the White Sox should be winning another eighteen games or so between them given their salary budgets.  Notice that none of the big-spending teams I talked about at the beginning, the Dodgers, Yankees, or Red Sox make the list, though the Sox’ performance was only slightly better than Seattle’s.

Finally let’s look at some simple predictions from the model.  In this chart I have reproduced the relationship between spending and winning and added two vertical bars, at about $62 million and $150 million.

making-the-playoffs-costs-150-million-2

The first figure represents the amount an average team would have to spend to expect they can win half their games, or 81 from a 162-game season.  The second figure, $150 million, is the cost it takes to win 90 games.  Nearly every team that has made the playoffs since 2010 has won 90 games or more, so $150 million represents the “entry fee” for having a solid chance at the playoffs. Unless you are the St. Louis Cardinals, of course.

For more details see the Technical Appendix.

1I use “player salaries” rather than total payroll throughout. Player salaries include monies paid to positional players and pitchers. Total payroll can include other sums like “dead” money paid to departed players to whom the club still has a contractual obligation.

2For 2015, I have extrapolated teams won-loss record to the full 162-game season using their records through September 5th.

Posted in Major League Baseball

Technical Appendix: Using Player Salaries to Predict Wins

Posted on September 6, 2015March 5, 2018 by Peter Lemieux

To get some perspective on the relationship between team performance and team payrolls, I have gathered data for each season from 2011 until 2015 on player salaries from the widely-cited database available at Spotrac.  These data are broken out by positional categories and include a figure for “dead” money, contractual obligations to players that are no longer with the team.

Regression Results

I start with a model that includes player salaries, the sum of payments to pitchers, catchers, infielders, outfielders, and designated hitters, in millions of dollars (“PlayersMM”) and a residual “other” category that is the difference between reported total payroll and player salaries again in millions of dollars (“OtherMM”).  As these results show clearly, the player salaries are what matter when it comes to winning

Pooled OLS, using 150 observations
 Included 30 cross-sectional units
 Time-series length = 5
 Dependent variable: Regular-Season Wins

             coefficient   std. error   t-ratio   p-value
 ---------------------------------------------------------
 const       69.2663       2.22782      31.09     1.62e-66 ***
 PlayersMM    0.130358     0.0227014     5.742    5.17e-08 ***
 OtherMM      0.00928354   0.0340035     0.2730   0.7852

Mean dependent var   81.02667   S.D. dependent var   11.10653
Sum squared resid    14929.62   S.E. of regression   10.07780
R-squared            0.187720   Adjusted R-squared   0.176668

The overall explanatory power of this model is pretty low.  Just under 19% of the variance in wins can be statistically accounted for using these salary data.  That’s just another way of stating what the graph in the main article shows, that there is a lot of “scatter” around the model’s predictions.  Teams with identical payrolls can win considerably more games than the model predicts, or considerably fewer.

Because the data contains multiple measurements on each of the thirty teams we can exploit that feature and allow for specific “team effects.”  These are the source of the second graph in the main article and comes from estimating a model with “dummy” variables included for each of the teams.  After whittling down the results to the teams with “significant” effects, we’re left with this model that forms the basis for the chart comparing teams in the main article:

Pooled OLS, using 150 observations
Included 30 cross-sectional units
Time-series length = 5
Dependent variable: Regular-Season wins

            coefficient   std. error   t-ratio   p-value
---------------------------------------------------------
const        75.3935      2.20255      34.23     2.64e-69 ***
PlayersMM     0.103430    0.0207827     4.977    1.89e-06 ***
OtherMM      −0.0360856   0.0302921    −1.191    0.2356
CHC         −10.1867      3.98745      −2.555    0.0117   **
COL         −13.5862      4.02725      −3.374    0.0010   ***
CWS          −8.46620     3.98705      −2.123    0.0355   **
HOU         −16.6031      4.11470      −4.035    9.00e-05 ***
MIA         −11.2013      4.08010      −2.745    0.0069   ***
MIN         −13.8250      4.01661      −3.442    0.0008   ***
PHI          −8.65976     4.05484      −2.136    0.0345   **
SEA          −7.50076     4.00319      −1.874    0.0631   *
STL          10.1054      3.98768       2.534    0.0124   **

Mean dependent var   81.02667   S.D. dependent var   11.10653
Sum squared resid    10435.62   S.E. of regression   8.695999
R-squared            0.432227   Adjusted R-squared   0.386969

After adjusting for the extreme cases, two changes happen to the effects of spending. One, the coefficient for player salaries falls from 0.13 in the simple model without adjustments to 0.10 here. That suggests that an average team needs to spend about $10 million to gain another victory, rather than the $8 million figure based on the simple model. However a more nuanced view of the spending effect includes the coefficient’s “standard error” of 0.02. That can be used to construct a “confidence interval” around the estimated effect; the actual effect of spending is 95% certain to fall somewhere in the range between 0.06 to 0.14, the result of subtracting or adding twice the standard error. That means it costs somewhere between $7 million (7 x 0.14) and $16 million (16 x 0.06) to gain another win.

Second, the portion of teams’ payrolls not devoted to players on the field now has the “proper” negative sign, though it still falls considerably short of conventional significance levels. It’s possible that teams are penalized for making bad decisions that lead to large pools of “dead” money, but the evidence is still pretty weak, and the effect quite small.

Finally we can ask whether spending on pitchers is more or less productive than spending on positional players. The answer is that it doesn’t matter. Including separate terms for both groups’ salaries adds no predictive power to the model. Spending another ten million dollars on pitching has the same effect as investing that money in the rest of the team.

Posted in Major League Baseball, Technical Notes

Posts navigation

Older posts
Newer posts

Search

Recent Posts

  • When NET and RPI Conflict, Pick NET.
  • How NET and RPI Influenced the Draw in 2019
  • NET? RPI? Will it matter?
  • How much does seeding matter?
  • Did the Astros Spend Their Way to the Top?

Categories

  • Major League Baseball
  • NCAA Men's Basketball
  • NFL
  • Technical Notes

Archives

  • February 2020
  • March 2019
  • May 2018
  • March 2018
  • March 2017
  • January 2017
  • February 2016
  • January 2016
  • November 2015
  • September 2015
  • March 2015
  • February 2015
  • January 2015
  • December 2014

Meta

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org
Powered by Headline WordPress Theme