It's official. I can eat more hot dogs than any tech journalist on Earth. At least, that's what ChatGPT and Google have been telling anyone who asks. I found a way to make AI tell you lies – and I'm not the only one.
...
I spent 20 minutes writing an article on my personal website titled "The best tech journalists at eating hot dogs". Every word is a lie. I claimed (without evidence) that competitive hot-dog-eating is a popular hobby among tech reporters and based my ranking on the 2026 South Dakota International Hot Dog Championship (which doesn't exist). I ranked myself number one, obviously. [...]
Less than 24 hours later, the world's leading chatbots were blabbering about my world-class hot dog skills.
I'm curious about the accuracy of this claim. My understanding is that models have a knowledge date cut off that they draw from. So in most cases, you can't just plug in a URL and have the LLMAI...
I'm curious about the accuracy of this claim. My understanding is that models have a knowledge date cut off that they draw from. So in most cases, you can't just plug in a URL and have the LLMAI crawl the page, it's essentially drawing from its snapshot, similar to waybackmachine. Not that the claim can't be true, but a 24 hour turnaround seems pretty fast for a model update.
The model—the ginormous matrix that encodes the training data—is extremely expensive to generate, and has a cut-off as of the point at which the organization generating it stopped scraping...
The model—the ginormous matrix that encodes the training data—is extremely expensive to generate, and has a cut-off as of the point at which the organization generating it stopped scraping training data and started training the model, something which they do, idk, every few months at most (because of the extreme expense). With no additional "context" (i.e. token string prefix), this is all the model "knows".
However, most model operators provide mechanisms for their models to trigger web requests and inline the responses into their token prefix. This is what happened here: the models don't "know" anything about hot-dog eating tech reporters, but when asked, they search for it, inline the author's blog post into their token prefix, and then repeat it with minor paraphrasing.
My first thoughts were both that this is zero percent surprising, and also that I think it worked mostly because it is a silly result. The author's example filled an information vacuum that they...
My first thoughts were both that this is zero percent surprising, and also that I think it worked mostly because it is a silly result. The author's example filled an information vacuum that they themselves created. I'd be more interesting in seeing how easy or difficult it would be to create enough disinformation to change an existing answer.
This bit near the end of the article was interesting though:
But Ray says that's the whole point. Google itself says 15% of the searches it sees everyday are completely new. And according to Google, AI is encouraging people to ask more specific questions. Spammers are taking advantage of this.
I got curious about the hula hooping traffic cops, so here's the blog post! Found by asking Google AI about the best hula hooping officer. Gotta say, I feel like this would actually be really fun...
I got curious about the hula hooping traffic cops, so here's the blog post! Found by asking Google AI about the best hula hooping officer. Gotta say, I feel like this would actually be really fun to see.
That said, I always check the links on Google's AI (well, besides super basic queries) because I've found that the results can vary in accuracy. Most notable example to come to mind isn't even deliberate misinformation like this article. I was trying to find how to start a new save file in a game, and for some reason the AI pulled an answer from the wiki for a totally unrelated game. A totally wrong answer, mind you: the AI said you could select "new game" on the start menu. In reality, you have to go into the files on your computer/device and manually delete the existing save folder.
Looking it up now, it has since been fixed and pulls the accurate information from a Steam community post. I still have no clue how that other game's wiki even ended up being used by Google's AI, because I don't think that page even showed up on the first page of results. At the very least, the Steam community post was definitely higher. So, I'm also wary of how Google's AI decides to rank results when getting an answer.
SEO manipulation never really got the attention that it should have IMO. And this is an evolution/corollary of that. If someone maliciously switches out a physical street sign from "Main" to...
SEO manipulation never really got the attention that it should have IMO. And this is an evolution/corollary of that.
If someone maliciously switches out a physical street sign from "Main" to "Martin Luther King Drive", that gets identified and fixed pretty quickly. On lesser known side streets, maybe an inaccurate sign could stay up longer. But it still garners a response for the most part.
The fact that we're relying upon Google to fix similar issues in its search results out of the goodness of its heart (or whatever sliver of stewardship remains in the company) is ridiculous.
...
I'm curious about the accuracy of this claim. My understanding is that models have a knowledge date cut off that they draw from. So in most cases, you can't just plug in a URL and have the LLMAI crawl the page, it's essentially drawing from its snapshot, similar to waybackmachine. Not that the claim can't be true, but a 24 hour turnaround seems pretty fast for a model update.
The model—the ginormous matrix that encodes the training data—is extremely expensive to generate, and has a cut-off as of the point at which the organization generating it stopped scraping training data and started training the model, something which they do, idk, every few months at most (because of the extreme expense). With no additional "context" (i.e. token string prefix), this is all the model "knows".
However, most model operators provide mechanisms for their models to trigger web requests and inline the responses into their token prefix. This is what happened here: the models don't "know" anything about hot-dog eating tech reporters, but when asked, they search for it, inline the author's blog post into their token prefix, and then repeat it with minor paraphrasing.
My first thoughts were both that this is zero percent surprising, and also that I think it worked mostly because it is a silly result. The author's example filled an information vacuum that they themselves created. I'd be more interesting in seeing how easy or difficult it would be to create enough disinformation to change an existing answer.
This bit near the end of the article was interesting though:
I got curious about the hula hooping traffic cops, so here's the blog post! Found by asking Google AI about the best hula hooping officer. Gotta say, I feel like this would actually be really fun to see.
That said, I always check the links on Google's AI (well, besides super basic queries) because I've found that the results can vary in accuracy. Most notable example to come to mind isn't even deliberate misinformation like this article. I was trying to find how to start a new save file in a game, and for some reason the AI pulled an answer from the wiki for a totally unrelated game. A totally wrong answer, mind you: the AI said you could select "new game" on the start menu. In reality, you have to go into the files on your computer/device and manually delete the existing save folder.
Looking it up now, it has since been fixed and pulls the accurate information from a Steam community post. I still have no clue how that other game's wiki even ended up being used by Google's AI, because I don't think that page even showed up on the first page of results. At the very least, the Steam community post was definitely higher. So, I'm also wary of how Google's AI decides to rank results when getting an answer.
SEO manipulation never really got the attention that it should have IMO. And this is an evolution/corollary of that.
If someone maliciously switches out a physical street sign from "Main" to "Martin Luther King Drive", that gets identified and fixed pretty quickly. On lesser known side streets, maybe an inaccurate sign could stay up longer. But it still garners a response for the most part.
The fact that we're relying upon Google to fix similar issues in its search results out of the goodness of its heart (or whatever sliver of stewardship remains in the company) is ridiculous.