Skip to content

Sports by the Numbers

Category: Technical Notes

Technical Appendix: 2018 Seeding Models

Posted on March 6, 2018March 11, 2018 by Peter Lemieux

Like its predecessors the model below was estimated using the “Tobit” method with the endpoints constrained to values of one and sixteen.

The top portion of the table above restates the basic relationship between RPI and seeding, and how that relationship varies by conference.  Again I find that seeding depends directly on RPI, and that mid-major conference members and, even more, major conference members receive a “bonus” when the Tournament Committee decides their seedings.  Those results present this now-familiar graph:

I’ve also reanalyzed the results for conference champions and have determined that only major conference champions receive a seeding bonus of 1.3 seeding ranks.  In other words, a major-conference team that wins its conference championship will be promoted at least one full rank in the seedings.

Finally I address the “favoritism” question in the lower half of the table.  The Committee has awarded teams belonging to the ACC and Big East conferences a bonus of about one third of a rank.  The Committee has also looked especially fondly at two teams over the years, mid-majors Gonzaga and Cincinnati.  That Gonzaga is one of these should not be a surprise to anyone who follows the Tournament. It is one of only five schools, along with Duke, Kansas, MIchigan State, and Wisconsin, to have qualified for the Tournament every year since 2000.  Cincinnati has the second most appearances, twelve, of any team not in a major conference.

On the other side of the ledger, three of the mid-major conferences have fared more poorly in the seedings than the others.  The Committee appeared to view with suspicion teams from the American Athletic, Colonial, and Western Athletic Conferences.  Teams from these conferences are seeded on average about one and a half rankings below what those teams’ performances would predict.  In addition, the Committee has historically marked down teams from Brigham Young University, seeding them nearly two ranks below what their RPI and conference membership would predict.

Posted in NCAA Men's Basketball, Technical Notes

Technical Appendix: The Model for Seedings with Champions Included

Posted on March 10, 2017March 10, 2017 by Peter Lemieux

This table presents the results of three “Tobit” estimations of the effects of RPI and conference membership on seedings in the NCAA Tournament.

The first column reports the results for a model that includes each teams RPI, its conference membership, and “interaction” terms for each combination of RPI and conference so that the slopes can differ across conferences.  RPI has a strong negative relationship with seeding, and that relationship is steeper for mid-major teams, and steeper still for majors.

The second column tests the hypothesis that the slope for majors and mid-majors are the same.  Here I include a variable that measures RPI for all major and mid-major teams together, then include a separate measure for the major conference teams.  If the majors and mid-majors followed the same path, the term for majors only in model (2) should be zero.  Since it is not, I maintain the distinction between majors and mid-majors in model (3).

This is the model where I add in whether a team won a conference championship. Champions from single-bid conferences actually have lower seeds because they are included in the Tournament automatically.  Since many of these are among the weaker teams, a tournament winner from a conference like the Colonial or Ivy receives a worse seeding than an average at-large team at the same RPI.

When we turn our attention to the mid-majors and majors, though, conference champions receive a substantial seeding bonus.  Mid-major champions have better (lower) seedings by a factor of 1.4.  For major champions the effect is a whopping 2.6 ranks.  Given the usual denigration of the conference championships by basketball pundits, these are surprisingly large effects indeed.

 

Posted in NCAA Men's Basketball, Technical Notes

Technical Appendix: Using Player Salaries to Predict Wins

Posted on September 6, 2015March 5, 2018 by Peter Lemieux

To get some perspective on the relationship between team performance and team payrolls, I have gathered data for each season from 2011 until 2015 on player salaries from the widely-cited database available at Spotrac.  These data are broken out by positional categories and include a figure for “dead” money, contractual obligations to players that are no longer with the team.

Regression Results

I start with a model that includes player salaries, the sum of payments to pitchers, catchers, infielders, outfielders, and designated hitters, in millions of dollars (“PlayersMM”) and a residual “other” category that is the difference between reported total payroll and player salaries again in millions of dollars (“OtherMM”).  As these results show clearly, the player salaries are what matter when it comes to winning

Pooled OLS, using 150 observations
 Included 30 cross-sectional units
 Time-series length = 5
 Dependent variable: Regular-Season Wins

             coefficient   std. error   t-ratio   p-value
 ---------------------------------------------------------
 const       69.2663       2.22782      31.09     1.62e-66 ***
 PlayersMM    0.130358     0.0227014     5.742    5.17e-08 ***
 OtherMM      0.00928354   0.0340035     0.2730   0.7852

Mean dependent var   81.02667   S.D. dependent var   11.10653
Sum squared resid    14929.62   S.E. of regression   10.07780
R-squared            0.187720   Adjusted R-squared   0.176668

The overall explanatory power of this model is pretty low.  Just under 19% of the variance in wins can be statistically accounted for using these salary data.  That’s just another way of stating what the graph in the main article shows, that there is a lot of “scatter” around the model’s predictions.  Teams with identical payrolls can win considerably more games than the model predicts, or considerably fewer.

Because the data contains multiple measurements on each of the thirty teams we can exploit that feature and allow for specific “team effects.”  These are the source of the second graph in the main article and comes from estimating a model with “dummy” variables included for each of the teams.  After whittling down the results to the teams with “significant” effects, we’re left with this model that forms the basis for the chart comparing teams in the main article:

Pooled OLS, using 150 observations
Included 30 cross-sectional units
Time-series length = 5
Dependent variable: Regular-Season wins

            coefficient   std. error   t-ratio   p-value
---------------------------------------------------------
const        75.3935      2.20255      34.23     2.64e-69 ***
PlayersMM     0.103430    0.0207827     4.977    1.89e-06 ***
OtherMM      −0.0360856   0.0302921    −1.191    0.2356
CHC         −10.1867      3.98745      −2.555    0.0117   **
COL         −13.5862      4.02725      −3.374    0.0010   ***
CWS          −8.46620     3.98705      −2.123    0.0355   **
HOU         −16.6031      4.11470      −4.035    9.00e-05 ***
MIA         −11.2013      4.08010      −2.745    0.0069   ***
MIN         −13.8250      4.01661      −3.442    0.0008   ***
PHI          −8.65976     4.05484      −2.136    0.0345   **
SEA          −7.50076     4.00319      −1.874    0.0631   *
STL          10.1054      3.98768       2.534    0.0124   **

Mean dependent var   81.02667   S.D. dependent var   11.10653
Sum squared resid    10435.62   S.E. of regression   8.695999
R-squared            0.432227   Adjusted R-squared   0.386969

After adjusting for the extreme cases, two changes happen to the effects of spending. One, the coefficient for player salaries falls from 0.13 in the simple model without adjustments to 0.10 here. That suggests that an average team needs to spend about $10 million to gain another victory, rather than the $8 million figure based on the simple model. However a more nuanced view of the spending effect includes the coefficient’s “standard error” of 0.02. That can be used to construct a “confidence interval” around the estimated effect; the actual effect of spending is 95% certain to fall somewhere in the range between 0.06 to 0.14, the result of subtracting or adding twice the standard error. That means it costs somewhere between $7 million (7 x 0.14) and $16 million (16 x 0.06) to gain another win.

Second, the portion of teams’ payrolls not devoted to players on the field now has the “proper” negative sign, though it still falls considerably short of conventional significance levels. It’s possible that teams are penalized for making bad decisions that lead to large pools of “dead” money, but the evidence is still pretty weak, and the effect quite small.

Finally we can ask whether spending on pitchers is more or less productive than spending on positional players. The answer is that it doesn’t matter. Including separate terms for both groups’ salaries adds no predictive power to the model. Spending another ten million dollars on pitching has the same effect as investing that money in the rest of the team.

Posted in Major League Baseball, Technical Notes

Technical Appendix: Estimating Seedings from RPI

Posted on February 26, 2015September 9, 2015 by Peter Lemieux

Method using program averages for teams with at least two appearances
seeding-model-estimates

Ordinary Least Squares applied to 832 team appearances (64 teams x 13 years):

Model 6: OLS, Appearances, 2002-2014 (832 observations)
Dependent variable: seed

             coefficient   std. error   t-ratio   p-value 
  --------------------------------------------------------
  const        36.7650      1.86680      19.69    4.38e-71 ***
  midmaj       29.1555      2.97857       9.788   1.76e-21 ***
  power        29.6810      2.63332      11.27    1.65e-27 ***
  rpi         −42.1049      3.45923     −12.17    1.82e-31 ***
  rpimid      −53.8989      5.23069     −10.30    1.66e-23 ***
  rpipower    −57.5874      4.60477     −12.51    5.46e-33 ***

Mean dependent var   8.500000   S.D. dependent var   4.612545
Sum squared resid    2389.298   S.E. of regression   1.700768
R-squared            0.864859   Adjusted R-squared   0.864041
F(5, 826)            1057.224   P-value(F)           0.000000
Log-likelihood      −1619.405   Akaike criterion     3250.809
Schwarz criterion    3279.152   Hannan-Quinn         3261.677

 

OLS does not have any special methods to handle “censored” information like seedings.  The coefficients above predict seeds below one and above sixteen.  A better alternative is Tobit with censors at one and sixteen.  This model (with some minor adjustments to the intercept differences) generates the graph in the article.

Tobit, 2002-2014 (n = 831)
Dependent variable: seed

             coefficient   std. error      z      p-value 
  --------------------------------------------------------
  const        49.1581      2.54088      19.35    2.16e-83 ***
  midmaj       18.8101      3.59081       5.238   1.62e-07 ***
  power        25.0724      3.32994       7.529   5.10e-14 ***
  rpi         −64.1098      4.64080     −13.81    2.09e-43 ***
  rpimid      −35.3690      6.31863      −5.598   2.17e-08 ***
  rpipower    −48.6491      5.83366      −8.339   7.47e-17 ***

Chi-square(5)        4401.971   p-value              0.000000
Log-likelihood      −1517.352   Akaike criterion     3048.703
Schwarz criterion    3081.762   Hannan-Quinn         3061.380

sigma = 1.79472 (0.0469158)
Left-censored observations: 52 (seed <= 1) 
Right-censored observations: 51 (seed >= 16)

Estimates from this model show a greater difference between the majors and mid-majors than do the OLS estimates.

 

 

Posted in NCAA Men's Basketball, Technical Notes

Technical Appendix: Model Results for Entire 2014 Regular Season

Posted on December 31, 2014December 31, 2014 by Peter Lemieux

spread-model-results-2014-regular

Teams that gain a hundred yards more on average than their opponents win by an average margin of 6.6 points.  A team that averages one more sack than its opponents increases its margin of victory by 2.1 points.  The net value of a turnover is estimated to be just under five points.

 

Posted in Technical Notes

Technical Appendix: Value of Selected Football Events

Posted on December 27, 2014December 27, 2014 by Peter Lemieux

Point Value of Selected Football Events

Dependent variable: Average Margin of Victory, 2014

spread-model-results

This table presents the results of a series of regressions using the average margin of victory as the dependent variable, and the per-game differences in yards, turnover, and sacks as the predictors.  I found no statistical difference between yards gained rushing versus passing.  The effect of a fumble may be greater than the effect of an interception, but we would need more data to determine any differences.  Thus the final model includes only the net yardage advantage and treats all turnovers identically.

Posted in Technical Notes

Search

Recent Posts

  • When NET and RPI Conflict, Pick NET.
  • How NET and RPI Influenced the Draw in 2019
  • NET? RPI? Will it matter?
  • How much does seeding matter?
  • Did the Astros Spend Their Way to the Top?

Categories

  • Major League Baseball
  • NCAA Men's Basketball
  • NFL
  • Technical Notes

Archives

  • February 2020
  • March 2019
  • May 2018
  • March 2018
  • March 2017
  • January 2017
  • February 2016
  • January 2016
  • November 2015
  • September 2015
  • March 2015
  • February 2015
  • January 2015
  • December 2014

Meta

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org
Powered by Headline WordPress Theme