July 27, 2013
The college football subreddit, r/cfb, has a user ranked poll that they run during every football season. Like the AP and Coaches Polls, each voter chooses their picks for the top 25 teams as of that week. However, some users base their selections on the results of an algorithm, and I’m interested in seeing if I can come up with a model that will produce similar results.
In the BCS system, team rankings are based on the AP Poll as well as several computer rankings. The mathematics behind these computer rankings are all slightly different, but most involve some variant of [] and make fairly extensive use of linear algebra. I decided to avoid that route because
Having thought about it for a plane ride from Madison to Atlanta, I’ve come up with a concept for how to create my rankings.
I’m planning on creating a data structure where IDs are mapped to a list of teams. Another data structure would hold a list of objects representing all of the games of the season, including winner, loser, score, and other data. The “score” would be equal to the number of unexpected wins and unexpected losses based on the current order. That score could be tweaked on a per-game basis according to outside factors (blowout, home/away, conference, bye week, etc.). At that point, it would select a random team and (some number) of additional teams. The additional teams would be chosen from the teams that are in the direction that the chosen team needs to move and that also need to move towards the chosen team’s position. For example, if the chosen team was underrated, the other teams would be chosen from teams ranked above it that are ranked overrate. From there, it would swap the team and each selected team until it found the lowest score. At that point, the selected theam and the other teams would be rechosen. This would be repeated many times, until (hopefully) a fairly good approximation of which teams are the best would be generated.
This approach won’t be useful at all for at least the first few weeks. However, I think that’s a common problem with algorithmic rankings, so I’d probably base my voting those weeks on instinct and observation.