-
7 votes
-
Predicting the NBA MVP with Machine Learning
Predicting the NBA MVP with Machine Learning Thesis Every season, basketball fans debate who deserves the MVP award. We built 3 machine learning models that attempt to answer that question using...
Predicting the NBA MVP with Machine Learning
Thesis
Every season, basketball fans debate who deserves the MVP award. We built 3 machine learning models that attempt to answer that question using box score statistics. At the end of each season, this award is determined by a panel of voters.
Methodology
Each model is trained on every NBA season from 1974 to 2017. For each player season, it looks at nine statistics:
- Points, assists, blocks, defensive rebounds, and field goals per game the core production numbers
- Win Shares (WS): an estimate of how many wins a player contributed to their team
- Value Over Replacement Player (VORP): how much better a player is than a league average replacement
- Box Plus/Minus (BPM): a player's net impact per 100 possessions
- Usage Rate (USG%): what share of team plays run through that player
From those nine numbers, the model learns what a typical MVP season looks like versus a non MVP season, then applies that knowledge to current players. Each model outputs an independent probability that a given player wins MVP, not a share of a single pool, so the values do not sum to 1. Think of it as each player's individual odds.
Three Models, One Question
Rather than relying on a single approach, the system runs three different models and lets you compare:
Logistic Regression
The simplest of the three. It draws a straight line through the data, each statistic gets a weight, and a player's score is the weighted sum of their stats. It's easy to interpret (a higher coefficient means that stat matters more).
Win Shares (WS) is by far the most influential feature, with an absolute coefficient of ~1.85, nearly double the next most important feature. Box Plus/Minus (BPM) ranks second at ~1.0, followed by Defensive Rebounds per Game (DRBPG, ~0.85) and Assists per Game (ASTPG, ~0.70). VORP and Field Goals per Game (FGPG) contribute moderately at ~0.50. Blocks per Game (BLKPG), Points per Game (PTSPG), and Usage Rate (USG%) have minimal weight, all under 0.15.
Random Forest
Builds hundreds of decision trees, each one asking a series of "is this stat above or below X?" questions and averages their answers. It handles complex relationships between stats well and is less sensitive to any one unusual data point. Think of it as a large committee of simple rules voting together.
WS again dominates at ~0.31, accounting for roughly twice the importance of the next feature. VORP (~0.15) and BPM (~0.125) rank second and third. DRBPG (~0.10), PTSPG (~0.08), BLKPG (~0.07), FGPG (~0.065), and ASTPG (~0.06) contribute in a fairly tight mid-range band. USG% is the least important at ~0.05. Compared to logistic regression, the Random Forest spreads importance more evenly across features.
Gradient Boosting
Also uses decision trees, but builds them sequentially: each new tree focuses on correcting the mistakes the previous ones made.
This model is heavily concentrated on just two features: BPM (~0.47) and WS (~0.41) together account for roughly 88% of total feature importance. All remaining features, PTSPG, VORP, ASTPG, DRBPG, contribute ~0.02–0.03 each, and BLKPG, USG%, and FGPG are effectively unused (near zero). This suggests the gradient boosting model learned that BPM and WS alone are nearly sufficient to separate MVP candidates.
Historical Results
The models were trained on data through 2017, so every season from 2018 onward is a genuine out of sample test, the models have never seen these players or seasons before.
Season Actual MVP LR RF GB 2018 James Harden #2 #2 #1 ✓ 2019 Giannis Antetokounmpo #1 ✓ #1 ✓ #1 ✓ 2020 Giannis Antetokounmpo #1 ✓ #1 ✓ #1 ✓ 2021 Nikola Jokić #1 ✓ #1 ✓ #1 ✓ 2022 Nikola Jokić #1 ✓ #1 ✓ #1 ✓ 2023 Joel Embiid #2 #4 #2 2024 Nikola Jokić #1 ✓ #1 ✓ #1 ✓ 2025 Shai Gilgeous-Alexander #3 #2 #569 Top-1 accuracy: LR 5/8 · RF 5/8 · GB 6/8
Top-3 accuracy: LR 8/8 · RF 7/8 · GB 7/8
Top-3 accuracy: LR 8/8 · RF 7/8 · GB 7/8
For five straight seasons (2019–2022 + 2024), all three models agreed on the same #1 pick, and were right every time.
In 2023, every model ranked Nikola Jokić #1, and by the numbers, he arguably had the better season. Joel Embiid won the award anyway, the kind of outcome that may reflect voter narrative/fatigue and team performance rather than pure statistics. In 2025, Gradient Boosting ranked Shai Gilgeous-Alexander outside the top 500, while Logistic Regression and Random Forest had him at #3 and #2 respectively. I have no idea why GB did this. Likely a bug.
Future Direction
No model is perfect, and these have known blind spots. Team record is not included, MVP voters have historically punished players on losing teams regardless of individual stats. Injuries and narrative don't appear in a box score. And the training data skews toward an older era; the three point revolution and the rise of players like SGA have introduced statistical profiles the 1970s–1990s data doesn't fully capture.
Current Season Predictions (2025–26)
LR RF GB #1 Nikola Jokić Shai Gilgeous-Alexander Nikola Jokić #2 Shai Gilgeous-Alexander Nikola Jokić Victor Wembanyama #3 Victor Wembanyama Victor Wembanyama Giannis Antetokounmpo #4 Luka Dončić Giannis Antetokounmpo Kawhi Leonard #5 Jalen Johnson Luka Dončić Luka Dončić Two of the three models have Nikola Jokić as the frontrunner. Random Forest is the dissenter, putting Shai Gilgeous-Alexander ahead. Victor Wembanyama appears in all three top 3s in just his second season, which is notable. Before running the models, I expected him to be #1 for all of them considering the way the models use advanced stats.
Conclusion
Thank you for reading. I hope you found this interesting. Basketball reference also has their own model if you would like to see a different result. Please do not gamble on my models!
13 votes -
New search engine reveals if ancestors were in Nazi party
23 votes -
The space race (back) to the Moon: Artemis, moon bases a competition beyond orbit
7 votes -
Students develop faux but sexy robotic sage grouse to strut their stuff in an effort to move a Grand Teton National Park breeding-ground lek away from jets
25 votes -
Tesla's supervised self-driving software gets Dutch okay, first in Europe
14 votes -
What might be going on with this indie game "fansite"?
I recently came across an interesting-looking indie game, Idols of Ash. Basically, you have to use a simple grapple-and-swing mechanic to descend through an eldritch underground complex while...
I recently came across an interesting-looking indie game, Idols of Ash. Basically, you have to use a simple grapple-and-swing mechanic to descend through an eldritch underground complex while being pursued by a dangerous "murderpede" monster.
I first played it on what I thought was the official site, idolsofash.fun. It's a pretty spiffy design, with a playable web version, extensive FAQs, strategy guides, and embedded images and video of the game. But I ran into some bugs while playing -- no sound effects, weird lighting. When I mentioned these flaws on the developer's Itch.io page, they responded that they had nothing to do with the site.
Turns out it has a disclaimer at the very bottom: "Unofficial fan site. Not affiliated with or endorsed by Leafy Games." Buying and installing the actual version solved my tech issues. And in playing the game more, I noticed that the various guides on the site were subtly wrong in a lot of ways. The About page claims it's maintained by a big fan of the game, but in hindsight the whole thing seems AI-written and full of hallucinations.
Thing is, I don't get the angle here. There's no advertising on the site. It prominently links directly to the game's official Steam and Itch pages, so they're not trying to deliver malware or intercept the developer's sales. I assume the glitches are from a poor decompilation and rehosting of the original Godot engine game, but there's nothing to be gained from that. The presence of images and video suggests some level of human involvement in the site design, meaning it's not some cheap fire-and-forget thing. The URL and content are far too specific to flip into something else after gaining SEO rank. It presents (and acts) exactly like a non-commercial labor-of-love fansite (albeit one that shares the paid game for free in a broken state).
Could this be a genuine, if misguided, attempt by an actual fan to share the game using AI tools? Or is there some kind of scam I'm not seeing? Is this sort of fake AI fansite with embedded versions of the game a widespread problem with indie titles now?
23 votes -
Finishing the toy that Nintendo abandoned -- breathing new life into Mario Kart Live: Home Circuit
34 votes -
Anthropic announces deal with Google, Broadcom, says revenue has tripled
31 votes -
Bitcoin’s creator has hidden behind the pseudonym Satoshi Nakamoto for seventeen years. But a trail of clues buried deep in crypto lore led to a 55-year-old computer scientist named Adam Back.
27 votes -
As an antidote to AI and online translation tools, a Cornell German professor gives her students a typewriter-only assignment once a semester
20 votes -
Nation's largest urban battery is being built in Daly City, California
16 votes -
Proton Meet isn't what they told you it was
25 votes -
An ED resident and developer built a free, medically accurate clinical casebook for every patient of The Pitt
20 votes -
US imports more from Taiwan than China for first time in decades
20 votes -
Denuvo DRM has been cirmumvented using hypervisor based bypass
51 votes -
Enjoying reading in the age of LLMs
I used to really value the art of essay writing. There seemed to be such a richness in the different ways people would construct arguments, structure those arguments, then deliver those arguments...
I used to really value the art of essay writing. There seemed to be such a richness in the different ways people would construct arguments, structure those arguments, then deliver those arguments stylistically, not just from the perspective of being persuaded as a reader but also from the perspective of seeing how a given writer thinks, relates to the living tradition of language, and understands the world conceptually. But it's basically lost most of its meaning to me in this age of LLMs. The reality is, LLMs are capable of writing texts that, if you gave them to a seasoned reader 5 years ago, they'd say it was well written and indicative of a truly thoughtful mind. Even if there currently exist certain tells with LLMs, those styles certainly existed in different ways in real human writing beforehand. Now, those perfectly reasonable set of styles are verboten and we have to dedicate half our deep focus to figuring out whether, or to what extent, an essay or article was written by AI. It's difficult to enjoy, let alone care, about essay writing and the writers behind them now.
I can still find value in books, though, because they were written in the past and I don't mind never reading any non-scientific book published after 2022 if it comes down to it.
23 votes -
Why Swedish schools are bringing back books
15 votes -
YouTube gets its own FAST channels
6 votes -
Quantum computing bombshells that are not April Fools
18 votes -
Gyre
15 votes -
Inside the ‘self-driving’ lab revolution
10 votes -
AI software for smart glasses wins £1m prize for technology to help people with dementia
10 votes -
Landslide: a ghost story
8 votes -
Ageless Linux emerges to protest OS-level age verification laws
45 votes -
Norway and Iceland have signed agreements to participate in the European Union's GOVSATCOM and IRIS2 secure communications programmes
12 votes -
Meta and YouTube found liable in landmark social media addiction trial
53 votes -
US regulator bans imports of new foreign-made routers, citing security concerns
58 votes -
Opinions wanted on regular DEXA scans
I’ve gone a bit too deep on a rabbit hole after an offhand comment about protein intake and how much protein I should actually be consuming. It turns out that the 1.6g/kg of body weight is fairly...
I’ve gone a bit too deep on a rabbit hole after an offhand comment about protein intake and how much protein I should actually be consuming. It turns out that the 1.6g/kg of body weight is fairly arbitrary and body weight itself is not a particularly good point to use for an estimate if you are overweight. With that in mind I have been wondering about getting a DEXA body composition scan. It would be useful, I think, because it can also tell me about visceral fat which is an area I am particularly concerned about.
It turns out that it’s pretty cheap to get done; about $45 if you sign up for quarterly scans with a company called BodySpec. Their whole thing is making things cheaper by having repeat visits; a quantity discount, if you will.
Before I decide to do this (and while I wait to hear back about if I can get one done for free with my health plan), I just wanted to get people’s opinions on them. Have you had one or a series done? And more importantly, how has it empowered you to improve your health?
In all honesty I’m not sure the results will encourage me to make any particular change in my lifestyle or routine that I wouldn’t have been able to figure out without it.
7 votes -
Fairphone released the industry’s first ever nature report - The impact of consumer electronics on nature and biodiversity
24 votes -
Michael Hafftka releases all of his ~3800 paintings as Creative Commons, explicitly for use in training AI
23 votes -
Quentin Tarantino and Sylvester Stallone are teaming for a 1930s-set series filming in black and white with “1930s cameras”
15 votes -
BYD claims five-minute electric vehicle charging with new battery tech
48 votes -
What do you think about putting your driver's license in your digital wallet?
I forgot my driver's license today but had my phone with me. I remembered seeing stories that google and apple both allow these (for some states) in the digital wallet. Before doing this, I...
I forgot my driver's license today but had my phone with me. I remembered seeing stories that google and apple both allow these (for some states) in the digital wallet.
Before doing this, I thought I would ask people here to weigh in on whether it is a good idea. Is it considered secure? Is it going to cause me more privacy issues than a physical card in my wallet?
This is also related to recent discussions about online age verification.
This is a related Tildes post from last year: Google Wallet adds age verification and more government ID support
20 votes -
Subnautica 2 publisher Krafton's CEO asked ChatGPT how to void $250 million contract, ignores lawyers, loses in court
72 votes -
Going to Europe this summer? Prepare for a long queue.
17 votes -
New technology promises to protect farmers from the next fertilizer shock
7 votes -
A writing professor’s new task in the age of AI: Teaching students when to struggle
20 votes -
RE//verse 2026: Hacking the Xbox One
14 votes -
AI was eroding trust in my classroom — so I got rid of typed papers and bought my students notebooks instead
37 votes -
Rescue dog Rosie’s cancer shrinks after world-first mRNA vaccine
32 votes -
NVIDIA forks Godot to add path tracing
20 votes -
Norwegian influencer buys failed property development in Spain to build ‘self-sufficient’ eco-community – Modern Eco Village plans to erect 500 homes, schools and shops
23 votes -
Last chance to watch: Where to stream every 2026 Oscar nominee before Sunday's big night
5 votes -
English language music is losing its stranglehold on the charts – sixteen different languages appeared in Spotify's Global Top 50 last year, more than double the figure from 2020
25 votes -
The secretive company filling video game sites with gambling and AI
37 votes -
Channel Surfer - Watch YouTube like it's cable tv
9 votes -
The first multi-behavior brain upload
35 votes -
US government announces pilot program for eVTOLS and ultralight aerial vehicles even without FAA certification
14 votes -
Why do AI company logos look like buttholes?
21 votes