College Football Conference Realignment — Regression | by Giovanni Malloy | Aug, 2023


Welcome to part 2 of my series on conference realignment! Last summer when conference realignment was in full swing, Tony Altimore published a study on Twitter that inspired me to do my own conference realignment analysis. This series is organized into four parts (and the full motivation for it is found in part 1):

  1. College Football Conference Realignment — Exploratory Data Analysis in Python
  2. College Football Conference Realignment — Regression
  3. College Football Conference Realignment — Clustering
  4. College Football Conference Realignment — node2vec
Photo by Norbert Braun on Unsplash

Hopefully, each part of the series provides you with a fresh perspective on the future of the beloved game of college football. For those of you who did not read part 1 a quick synopsis is that I created my own data set compiled from sources across the web. These data include basic information about each FBS program, a non-canonical approximation of all college football rivalries, stadium size, historical performance, frequency appearances in AP top 25 polls, whether the school is an AAU or R1 institution (historically important for membership in the Big Ten and Pac 12), the number of NFL draft picks, data on program revenue from 2017–2019, and a recent estimate on the size of college football fan bases. As it turns out, stadium capacity, 2019 revenue, and historical AP poll success correlate strongly with the estimated fan base size in Tony Altimore’s analysis:

The correlation matrix shows a perfect positive relationship between each feature and itself. We also see high correlation between stadium capacity, fan base size, 2019 revenue, and the percentage of weeks appearing in the AP top 25 between 2001 and 2021.

Supervised Learning

So, this got me thinking: can we create a simple regression model to estimate fan base size?

Broadly, we can divide machine learning into supervised and unsupervised learning. In supervised learning, the goal is to predict a pre-defined discrete class or continuous variable. In unsupervised learning, the goal is to discover trends in the data that are non-obvious. Regression is a type of supervised learning where the prediction target is a continuous variable. A great reference guide and resource was put together by Shervine and Afshine Amidi. (It has been translated into…





Source link

Leave a Comment