Staying Classy: Understanding How to Evaluate Classification Models

My previous article dug into spread-based models, and how there is a guaranteed long term profitability if any given model has above ~53% accuracy on correctly predicting spreads. The other common bet, “moneylines” (simply the winner of the game) require binary classification models to be used: the predicted outcome from the model can only fall into two categories. In this case, those categories would be a home team either winning or losing. These models take a few more steps compared to a spread-based model, in which some sort of regression technique would be used, in order to figure out whether a given model is profitable, but here are the basic steps:

Firstly, the accuracy, AUC score, and other performance metrics of any given classification model should be relatively high. A good AUC score is anything above 0.7; for accuracy, a good baseline threshold to compare to is 66%, as this is the accuracy of NFL moneyline predictions from sportsbooks since 2002. Sidenote: This makes the machine learning techniques much more intricate and require different architecture than a conventional model built with sklearn or tensorflow, as the expected accuracy from a machine learning model in most fields is 90+%, whereas our definition of success is likely anything over 66%
This is not how it ends though, because the games will have a large distribution of odds for the winner, ranging from heavy favorites of -1400, to a toss up where the winner is +100, to a heavy underdog at +1400. So, even if a given model has >70% accuracy, if the correct picks are all heavy favorites and it loses all those 50/50 games, then the model still would not be profitable (most likely). Likewise, a model with only 50% accuracy could also be profitable if the opposite is true (it gets a lot of the underdog wins right while missing the 50/50s). This is where the kelly criterion comes in. This is a formula originally invented for financial models for stock market but can also be applied to sports betting when comparing a model to the given odds by the bookies. The formula is given below:

K = (PB-Q)/B

Here, the output (K) is a fraction of the total bankroll that the bet size should be (if it is not profitable, then K<0 which means the bet should not be placed). For example if I set aside $1,000 to use for sports books in an account and k=0.30 for a certain bet, then this formula tells us to place a $300 this bet (1000*0.30). The output is generally scaled down to make smaller bets in reality.

P=probability of a team winning derived from the model

Q = 1-P = probability of team losing

B = odds given by a sportsbook

This formula will be fully derived for those interested in another article, but for now let’s focus on how this is applied.

3. Now that the kelly criterion has given a specific dollar amount for each bet, a given model can now return an ROI. Simply use the model to predict probabilities for a new set of data the model has not been trained on, but still has truth labels (i.e., the result/outcome is known). Then, use the kelly criteria and compare the odds for those given games from the sportsbook. This will allow a “simulated bet” where the bet outcome would be known based on whether the model was correct and the moneyline odds for that game. Now each of these bets will be simulated for the entire dataset, and the ROI will simply be:

100*(Mf-Mi)/Mi ,

where Mf/Mi are the bankroll after all the simulated bets were placed and the starting balance respectively.

For reference, current models I have been running on data since 2014 are showing ROI values in the region of 70%, so stay tuned to see the results!

Sports Models Built with Machine Learning

Staying Classy: Understanding How to Evaluate Classification Models

Leave a comment Cancel reply

Comments (

)

Staying Classy: Understanding How to Evaluate Classification Models

Share this:

Leave a comment Cancel reply

Comments (

)