Quoting Toby Shelvane, founder of ManticAI- "Some say LLMs just regurgitate their training data, but you can’t predict the future like that. [...] It requires genuine reasoning." Just.. no....
Quoting Toby Shelvane, founder of ManticAI- "Some say LLMs just regurgitate their training data, but you can’t predict the future like that. [...] It requires genuine reasoning."
Just.. no. Literally the idea of these agents is to spit out the most likely next token given an input set, and predicting future events is somewhat an effort in extrapolation. If anything, this just lowers my confidence in the human ability to predict future events.
I'm not exactly impressed by companies that are spun up to middleman AI- they're not making anything novel, they're just employing multiple existing backends to try to fudge a competition.
The guy behind the company seems to be for real, so I think this is one of the comparatively rare occasions where they’re more than just a quick cash in wrapped around a ChatGPT API account.
The guy behind the company seems to be for real, so I think this is one of the comparatively rare occasions where they’re more than just a quick cash in wrapped around a ChatGPT API account.
I'm not certain I understand Google Scholar attribution requirements, but the only articles in that list for which he is actually listed as a contributor are the ones on analysis of the danger of...
I'm not certain I understand Google Scholar attribution requirements, but the only articles in that list for which he is actually listed as a contributor are the ones on analysis of the danger of AI, and before that, the privacy implications for track-and-trace solutions? The two actually technical articles that might require some underlying understanding of LLMs to write have very extensive lists of attributions, but he's not in either list. Am I missing something?
Google Scholar shows a truncated list of authors, his name is present in the full list if you expand it on the original publication. You have actually stumbled across something that's a whole...
Google Scholar shows a truncated list of authors, his name is present in the full list if you expand it on the original publication. You have actually stumbled across something that's a whole thing here, and needs a bit of context...
There's some debate about the trend for tech industry scholarly publications including 1,000+ authors, which is something that a lot of the headline papers do now, but being part of that thousand is no small achievement. On the one hand, yeah, it does kind of obfuscate the significance of any individual contribution, and it's definitely influenced in part by internal politics (who in the field wouldn't want their name attached to the main publication for a major model, after all?); but on the other hand, you don't get a piece of research at that scale done without a large team all doing meaningful work towards it, and unlike in academia there's far less likelihood that each subgroup within the team will be working on their own respective publications, so the single big paper with everyone on it is often the only way to really credit everyone who genuinely did make the research happen.
tl;dr he's an author on the big papers, but so are >1,000 other people. Most if not all of them probably did make real contributions, but it's hard to untangle how much is down to any one of them.
Being an author on Gemini doesn’t make you “for real” it makes you someone with an insane financial motivation for the general public to believe: Further the first quote of the article: Is wrong...
Being an author on Gemini doesn’t make you “for real” it makes you someone with an insane financial motivation for the general public to believe:
Some say LLMs just regurgitate their training data, but you can’t predict the future like that. [...] It requires genuine reasoning.
Further the first quote of the article:
Ben Shindel, one of the professional forecasters who found himself behind AI during the contest before finishing above Mantic. “We’ve really come a long way here compared with a year ago when the best bot was at something like rank 300.”
Is wrong based on the first damn sentence on the contest’s webpage:
Congratulations to the winners of our first ever Metaculus Cup!
Maybe he meant something else, no clue, but do we really need yet another free marketing puff piece for some AI startup from a journalist who can’t be bothered to read the web page for the contest they’re reporting on so that they know to ask a freaking clarifying question in paragraph two?
They have been doing a quarterly cup competition since 2023. This latest one is just a slightly different format (summer rather than Q1/Q2/Q3/Q4) , so it’s the “first” of the new format. There’s...
Maybe he meant something else, no clue
They have been doing a quarterly cup competition since 2023. This latest one is just a slightly different format (summer rather than Q1/Q2/Q3/Q4) , so it’s the “first” of the new format. There’s absolutely nothing incorrect about any of the statements.
I've worked alongside people from DeepMind on a couple of occasions, and I can tell you with absolute certainty that they employ some of the most ridiculously intelligent and capable scientists...
I've worked alongside people from DeepMind on a couple of occasions, and I can tell you with absolute certainty that they employ some of the most ridiculously intelligent and capable scientists I've ever met. They've got a goddamn Nobel Prize under their belt! So you can call this personal bias if you like, or you can call it direct experience, but I actually put a lot of weight on someone being an author on one of their papers.
Like I said above, it's not a 100% guarantee of meaningful contribution when there are that many people listed and that many complex internal reasons governing who does and doesn't get a mention, and I'm sure not everyone at any large organisation is necessarily part of their A-team even on a good day, but in lieu of any other information it's enough for me to think they're probably actually working on some legitimate tech rather than just repackaging someone else's hosted LLM.
But yeah, I get it. It is a marketing piece. Given how little I could see about the company itself (that's actually why I ended up looking for the guy's personal CV, because I wanted to know if they were legit and couldn't see anything but copies of this same story when I tried to dig into the company) it's maybe fair to call it a puff piece. And we're being absolutely flooded by buzzword-laden crap under the banner of "AI", so I get the skepticism and the irritation. All of that is why I think it's even more important to sift out the work that could potentially make an actual positive contribution from the crap that almost definitely won't.
Please can you elaborate on how the AI effect relates to the original article, or my comment, depending on which you meant? I'm not certain I'm catching your meaning.
Please can you elaborate on how the AI effect relates to the original article, or my comment, depending on which you meant? I'm not certain I'm catching your meaning.
I think the linked article puts it clearly enough Just because a LLM associated application is doing it doesn’t mean the task is now somehow lesser or doesn’t involve “reasoning”.
I think the linked article puts it clearly enough
The author Pamela McCorduck writes: "It's part of the history of the field of artificial intelligence that every time somebody figured out how to make a computer do something—play good checkers, solve simple but relatively informal problems—there was a chorus of critics to say, 'that's not thinking'."
Researcher Rodney Brooks complains: "Every time we figure out a piece of it, it stops being magical; we say, 'Oh, that's just a computation.'"
Just because a LLM associated application is doing it doesn’t mean the task is now somehow lesser or doesn’t involve “reasoning”.
Then either this isn't the whole story or the human ability to "reason" is worse at these predictions than "literally the idea to spit out the most likely next token given an input set" and as...
Literally the idea of these agents is to spit out the most likely next token given an input set, and predicting future events is somewhat an effort in extrapolation. If anything, this just lowers my confidence in the human ability to predict future events.
Then either this isn't the whole story or the human ability to "reason" is worse at these predictions than "literally the idea to spit out the most likely next token given an input set" and as such isn't as valuable (at these predictions).
The real take is that finding patterns in the data has a LOT of very solid applications in a variety of fields and machine learning is seemingly an incredibly useful tool that excels at finding patterns in the data. Or at the very least exceeds human ability to do so.
One of the most straightforward examples when a "dumb" algorithm outperformed human expert predictions would be index funds in the stock market. So, you don't need some spiritual ability to "reason" or be "intelligent" to get good results from accumulating data (one could argue that by accumulating data from many humans we are already accumulating their reasoning capabilities and hopefully this way can exceed any individual one's).
Also this quote from the article:
Warren Hatch, the chief executive of Good Judgment, a forecasting company co-founded by Tetlock, said: “We expect AI will excel in certain categories of questions, like monthly inflation rates. For categories with sparse data that require more judgment, humans retain the edge. The main point for us is that the answer isn’t human or AI, but instead human and AI to get the best forecast possible as quickly as possible.”
I think there is some confusion out there stemming from the true fact that an LLM produces a vector that a sampler can interpret as a probability distribution and the mistaken idea that LLMs are...
I think there is some confusion out there stemming from
the true fact that an LLM produces a vector that a sampler can interpret as a probability distribution and
the mistaken idea that LLMs are effectively massive full context Markov chains that provide a probability distribution that matches the training corpus
the model outputs a probability distribution of tokens, and (when the temp isn't 0) the sampler will often pick a token which isn't the most likely one to prevent getting stuck in loops / enable...
the model outputs a probability distribution of tokens, and (when the temp isn't 0) the sampler will often pick a token which isn't the most likely one to prevent getting stuck in loops / enable "creativity" (creativity is a very loose analogy, don't take it too literally). Even when the temp is 0 models often aren't deterministic for complicated reasons.
So what's your argument? Sampler output is still just a function of input, model and noise. What complicated reasons? Correlated noise from some kind of race condition? Memory errors?
So what's your argument? Sampler output is still just a function of input, model and noise.
What complicated reasons? Correlated noise from some kind of race condition? Memory errors?
argument is that "most likely" implies deterministic in terms of only the text, saying deterministic as a function of the text and also a source of randomness is basically vacuous
argument is that "most likely" implies deterministic in terms of only the text, saying deterministic as a function of the text and also a source of randomness is basically vacuous
I think the main advantage here is being able to look deeply into 60 unrelated topics without getting just plain bored. I’m sure an “mixture of experts” of actual human experts will still...
I think the main advantage here is being able to look deeply into 60 unrelated topics without getting just plain bored. I’m sure an “mixture of experts” of actual human experts will still outperform for much longer. Interesting stuff though.
It's also important to note that while the title seems to imply that it ranked first, that is not the case. It ranked eight in a competition of 300 competitors. Which is still good, but not...
It's also important to note that while the title seems to imply that it ranked first, that is not the case. It ranked eight in a competition of 300 competitors. Which is still good, but not outperforming all humans.
I’m generally an optimist in these things but I was also curious, but couldn’t find, how many AI entrants there were. If 150 of the entrants were AI, this would probably be less impressive of an...
I’m generally an optimist in these things but I was also curious, but couldn’t find, how many AI entrants there were. If 150 of the entrants were AI, this would probably be less impressive of an achievement, or example.
Anyone familiar with the competition know more about the scoring? I'm curious if they had any system to allow contestants to weight their confidence in their predictions.
Anyone familiar with the competition know more about the scoring? I'm curious if they had any system to allow contestants to weight their confidence in their predictions.
Here's their scoring FAQ if you'd like to check it out? More involved than I'd feel like giving an opinion on, but a quick look suggests it does penalize overconfidence: It also has scores for...
Here's their scoring FAQ if you'd like to check it out? More involved than I'd feel like giving an opinion on, but a quick look suggests it does penalize overconfidence:
One interesting property of the log score: it is much more punitive of extreme wrong predictions than it is rewarding of extreme right predictions.
It also has scores for things like performance relative to peers and "coverage" (how early you were / the span for which your prediction was correct).
Google Deepmind do have AI models which forecast the weather to a higher degree of accuracy than traditional models https://deepmind.google/science/weathernext/ + live storm prediction demo thing...
Quoting Toby Shelvane, founder of ManticAI- "Some say LLMs just regurgitate their training data, but you can’t predict the future like that. [...] It requires genuine reasoning."
Just.. no. Literally the idea of these agents is to spit out the most likely next token given an input set, and predicting future events is somewhat an effort in extrapolation. If anything, this just lowers my confidence in the human ability to predict future events.
I'm not exactly impressed by companies that are spun up to middleman AI- they're not making anything novel, they're just employing multiple existing backends to try to fudge a competition.
The guy behind the company seems to be for real, so I think this is one of the comparatively rare occasions where they’re more than just a quick cash in wrapped around a ChatGPT API account.
I'm not certain I understand Google Scholar attribution requirements, but the only articles in that list for which he is actually listed as a contributor are the ones on analysis of the danger of AI, and before that, the privacy implications for track-and-trace solutions? The two actually technical articles that might require some underlying understanding of LLMs to write have very extensive lists of attributions, but he's not in either list. Am I missing something?
Google Scholar shows a truncated list of authors, his name is present in the full list if you expand it on the original publication. You have actually stumbled across something that's a whole thing here, and needs a bit of context...
There's some debate about the trend for tech industry scholarly publications including 1,000+ authors, which is something that a lot of the headline papers do now, but being part of that thousand is no small achievement. On the one hand, yeah, it does kind of obfuscate the significance of any individual contribution, and it's definitely influenced in part by internal politics (who in the field wouldn't want their name attached to the main publication for a major model, after all?); but on the other hand, you don't get a piece of research at that scale done without a large team all doing meaningful work towards it, and unlike in academia there's far less likelihood that each subgroup within the team will be working on their own respective publications, so the single big paper with everyone on it is often the only way to really credit everyone who genuinely did make the research happen.
tl;dr he's an author on the big papers, but so are >1,000 other people. Most if not all of them probably did make real contributions, but it's hard to untangle how much is down to any one of them.
Ah, I see- thanks for the dive into a world I'm unfamiliar with :)
Being an author on Gemini doesn’t make you “for real” it makes you someone with an insane financial motivation for the general public to believe:
Further the first quote of the article:
Is wrong based on the first damn sentence on the contest’s webpage:
Maybe he meant something else, no clue, but do we really need yet another free marketing puff piece for some AI startup from a journalist who can’t be bothered to read the web page for the contest they’re reporting on so that they know to ask a freaking clarifying question in paragraph two?
https://www.metaculus.com/notebooks/39990/winners-of-the-summer-2025-metaculus-cup/
They have been doing a quarterly cup competition since 2023. This latest one is just a slightly different format (summer rather than Q1/Q2/Q3/Q4) , so it’s the “first” of the new format. There’s absolutely nothing incorrect about any of the statements.
https://www.metaculus.com/notebooks/17700/quarterly-cup-tournament-q1-2025/
I've worked alongside people from DeepMind on a couple of occasions, and I can tell you with absolute certainty that they employ some of the most ridiculously intelligent and capable scientists I've ever met. They've got a goddamn Nobel Prize under their belt! So you can call this personal bias if you like, or you can call it direct experience, but I actually put a lot of weight on someone being an author on one of their papers.
Like I said above, it's not a 100% guarantee of meaningful contribution when there are that many people listed and that many complex internal reasons governing who does and doesn't get a mention, and I'm sure not everyone at any large organisation is necessarily part of their A-team even on a good day, but in lieu of any other information it's enough for me to think they're probably actually working on some legitimate tech rather than just repackaging someone else's hosted LLM.
But yeah, I get it. It is a marketing piece. Given how little I could see about the company itself (that's actually why I ended up looking for the guy's personal CV, because I wanted to know if they were legit and couldn't see anything but copies of this same story when I tried to dig into the company) it's maybe fair to call it a puff piece. And we're being absolutely flooded by buzzword-laden crap under the banner of "AI", so I get the skepticism and the irritation. All of that is why I think it's even more important to sift out the work that could potentially make an actual positive contribution from the crap that almost definitely won't.
This is like the AI effect in action.
Please can you elaborate on how the AI effect relates to the original article, or my comment, depending on which you meant? I'm not certain I'm catching your meaning.
I think the linked article puts it clearly enough
Just because a LLM associated application is doing it doesn’t mean the task is now somehow lesser or doesn’t involve “reasoning”.
Then either this isn't the whole story or the human ability to "reason" is worse at these predictions than "literally the idea to spit out the most likely next token given an input set" and as such isn't as valuable (at these predictions).
The real take is that finding patterns in the data has a LOT of very solid applications in a variety of fields and machine learning is seemingly an incredibly useful tool that excels at finding patterns in the data. Or at the very least exceeds human ability to do so.
One of the most straightforward examples when a "dumb" algorithm outperformed human expert predictions would be index funds in the stock market. So, you don't need some spiritual ability to "reason" or be "intelligent" to get good results from accumulating data (one could argue that by accumulating data from many humans we are already accumulating their reasoning capabilities and hopefully this way can exceed any individual one's).
Also this quote from the article:
If humans actually could predict the future we'd have a lot more problems.
nitpick: with standard sampling and post training techniques this isn't true
It must be, given there is only the model, the sequence so far and white noise in the working set and the output is the next token or two.
I think there is some confusion out there stemming from
the model outputs a probability distribution of tokens, and (when the temp isn't 0) the sampler will often pick a token which isn't the most likely one to prevent getting stuck in loops / enable "creativity" (creativity is a very loose analogy, don't take it too literally). Even when the temp is 0 models often aren't deterministic for complicated reasons.
So what's your argument? Sampler output is still just a function of input, model and noise.
What complicated reasons? Correlated noise from some kind of race condition? Memory errors?
It's mostly floating point errors
Those are completely deterministic and repeatable, unless you somehow reorder the operations.
argument is that "most likely" implies deterministic in terms of only the text, saying deterministic as a function of the text and also a source of randomness is basically vacuous
The random bits are part of the input set for the computation. There is no creativity, just a bit of whimsy.
I think the main advantage here is being able to look deeply into 60 unrelated topics without getting just plain bored. I’m sure an “mixture of experts” of actual human experts will still outperform for much longer. Interesting stuff though.
It's also important to note that while the title seems to imply that it ranked first, that is not the case. It ranked eight in a competition of 300 competitors. Which is still good, but not outperforming all humans.
I’m generally an optimist in these things but I was also curious, but couldn’t find, how many AI entrants there were. If 150 of the entrants were AI, this would probably be less impressive of an achievement, or example.
Anyone familiar with the competition know more about the scoring? I'm curious if they had any system to allow contestants to weight their confidence in their predictions.
Here's their scoring FAQ if you'd like to check it out? More involved than I'd feel like giving an opinion on, but a quick look suggests it does penalize overconfidence:
It also has scores for things like performance relative to peers and "coverage" (how early you were / the span for which your prediction was correct).
Gah, I was really hoping this was about improving weather forecasting.
Google Deepmind do have AI models which forecast the weather to a higher degree of accuracy than traditional models https://deepmind.google/science/weathernext/ + live storm prediction demo thing https://deepmind.google.com/science/weatherlab