It's official. I can eat more hot dogs than any tech journalist on Earth. At least, that's what ChatGPT and Google have been telling anyone who asks. I found a way to make AI tell you lies – and I'm not the only one.
...
I spent 20 minutes writing an article on my personal website titled "The best tech journalists at eating hot dogs". Every word is a lie. I claimed (without evidence) that competitive hot-dog-eating is a popular hobby among tech reporters and based my ranking on the 2026 South Dakota International Hot Dog Championship (which doesn't exist). I ranked myself number one, obviously. [...]
Less than 24 hours later, the world's leading chatbots were blabbering about my world-class hot dog skills.
I'm curious about the accuracy of this claim. My understanding is that models have a knowledge date cut off that they draw from. So in most cases, you can't just plug in a URL and have the LLMAI...
I'm curious about the accuracy of this claim. My understanding is that models have a knowledge date cut off that they draw from. So in most cases, you can't just plug in a URL and have the LLMAI crawl the page, it's essentially drawing from its snapshot, similar to waybackmachine. Not that the claim can't be true, but a 24 hour turnaround seems pretty fast for a model update.
The model—the ginormous matrix that encodes the training data—is extremely expensive to generate, and has a cut-off as of the point at which the organization generating it stopped scraping...
The model—the ginormous matrix that encodes the training data—is extremely expensive to generate, and has a cut-off as of the point at which the organization generating it stopped scraping training data and started training the model, something which they do, idk, every few months at most (because of the extreme expense). With no additional "context" (i.e. token string prefix), this is all the model "knows".
However, most model operators provide mechanisms for their models to trigger web requests and inline the responses into their token prefix. This is what happened here: the models don't "know" anything about hot-dog eating tech reporters, but when asked, they search for it, inline the author's blog post into their token prefix, and then repeat it with minor paraphrasing.
Sure, but still, overnight? Looking at the details page for Google's Gemini v3, it lists the model cutoff date as January 2025, with its last knowledge update being November 2025:...
Even with period updates from scraping, that feels suspiciously fast. Asking about what happened in my local township last week, it references a mayoral election that occurred back in October. And that was widely covered (by local news, anyways).
Not overnight, on demand. When you ask Claude or ChatGPT about a real-world fact that isn't in their training data, they often make an ad hoc web search and try to summarize the results for you....
Not overnight, on demand. When you ask Claude or ChatGPT about a real-world fact that isn't in their training data, they often make an ad hoc web search and try to summarize the results for you. You can see this happening, there will be a visual indicator (Claude shows a little loading indicator that says something like "Web search...").
It's complex. Yes you can ask an ai to crawl a page and it will. AI agents will do this scarily well and do it in a much more complex way. I asked the ChatGPT agent today to find the email of the...
It's complex. Yes you can ask an ai to crawl a page and it will. AI agents will do this scarily well and do it in a much more complex way.
I asked the ChatGPT agent today to find the email of the VP of an organization I was trying to get a sales lead on, and it ended up finding an obscure permitting request document posted on a public facing server from a business in Alaska who had a couple saved emails with this organization attached to their request, including the email address I needed AND an email signature with direct phone line and extension, all from like a week ago.
I got the sales lead at least!
These models also have an updated general memory that contains recent events, and it will do a web search for more information in a lot of cases.
Models haven't really done that whole "My training data only extends to this date" thing in a while because they integrate different technologies to keep information fresh.
And some of these memory vectors can be updated by the models themselves in some cases. Because why wouldn't an AI automation company use AI automation to automate updating their automation! Hahah
See, this is where my experience differs. I explicitly asked it to crawl a page on my webserver to answer a question, as I was curious to see what UA and IP address it would come from. Despite...
Yes you can ask an ai to crawl a page and it will.
See, this is where my experience differs. I explicitly asked it to crawl a page on my webserver to answer a question, as I was curious to see what UA and IP address it would come from.
Despite asking a few ways and sending a few different unique URL's, nothing came up in my nginx logs. In hindsight, perhaps I should have paid attention weeks or months later to see if the unique URL's I sent to the LLMAI did ever pop up in my nginx logs, but that was a while ago.
It's all very "black box", I think, nobody wants to admit how the inner workings actually work. Sure, they're data vacuums, but how they interact with the outside world seems a mystery for now.
What is "it" in this cases? People have been replying about "models" to you but that isn't entirely correct. It is the combination of the model and the service it is tied into. If you run a local...
What is "it" in this cases? People have been replying about "models" to you but that isn't entirely correct. It is the combination of the model and the service it is tied into. If you run a local model with a barebones chat client it will not search for example. But if the chat interface allows the model to use web search or browse the internet then the model will be able to utilize this. It often is something you need to explicitly enable
Yes, large language models have a cutoff date, but they are often given additional information to work with. In the case of Google search, it's doing a search and giving the model the contents of...
Yes, large language models have a cutoff date, but they are often given additional information to work with. In the case of Google search, it's doing a search and giving the model the contents of the web pages that were found, which it is then a bit too gullible about repeating.
You can think of the AI as a fancy web search tool.
This matches with my experience (perplexity, gemini, claude, and others). When using web search, the models are instructed to believe what they find. So if bad/malicious results pop up in their...
This matches with my experience (perplexity, gemini, claude, and others). When using web search, the models are instructed to believe what they find. So if bad/malicious results pop up in their discovery, they can't tell the difference between them and legit results.
I had already assumed that this was being abused somehow, but it's still funny/worrying/depressing/more-adjectives seeing it in action in such an absurd way
My first thoughts were both that this is zero percent surprising, and also that I think it worked mostly because it is a silly result. The author's example filled an information vacuum that they...
My first thoughts were both that this is zero percent surprising, and also that I think it worked mostly because it is a silly result. The author's example filled an information vacuum that they themselves created. I'd be more interesting in seeing how easy or difficult it would be to create enough disinformation to change an existing answer.
This bit near the end of the article was interesting though:
But Ray says that's the whole point. Google itself says 15% of the searches it sees everyday are completely new. And according to Google, AI is encouraging people to ask more specific questions. Spammers are taking advantage of this.
Yeah, not to give anyone ideas but one of my AI-related concerns, should it become good enough in programming, is that scammers can implement real-time monitoring of online search trends and spit...
Yeah, not to give anyone ideas but one of my AI-related concerns, should it become good enough in programming, is that scammers can implement real-time monitoring of online search trends and spit out websites tailored to fit them out the other end. Once perfectly efficient, the system could even produce a tailor-made website for each individual web search (in my imagination at least, but I'm not an IT pro and I hope I'm wrong about this).
ETA: And not just scammers but propagandists and other nefarious actors too.
SEO manipulation never really got the attention that it should have IMO. And this is an evolution/corollary of that. If someone maliciously switches out a physical street sign from "Main" to...
SEO manipulation never really got the attention that it should have IMO. And this is an evolution/corollary of that.
If someone maliciously switches out a physical street sign from "Main" to "Martin Luther King Drive", that gets identified and fixed pretty quickly. On lesser known side streets, maybe an inaccurate sign could stay up longer. But it still garners a response for the most part.
The fact that we're relying upon Google to fix similar issues in its search results out of the goodness of its heart (or whatever sliver of stewardship remains in the company) is ridiculous.
Did you know that anyone can buy a printer that will print whatever dangerous nonsense they want to write? They can even put a fake letterhead on top that says "super legit trustworthy source."...
Did you know that anyone can buy a printer that will print whatever dangerous nonsense they want to write? They can even put a fake letterhead on top that says "super legit trustworthy source." They can then give that, real, actual, paper to anyone and act as if it's the truth! Damn your hubris, Gutenberg!
We should all know by now that you can't trust anything an AI says. You can't trust anything. That should be our default at this point, right? Truth is hard.
AI is useful, flawed, and complicated. But at this point, "you can't trust it" isn't that interesting beyond educating people who are just starting to use it, I guess.
I got curious about the hula hooping traffic cops, so here's the blog post! Found by asking Google AI about the best hula hooping officer. Gotta say, I feel like this would actually be really fun...
I got curious about the hula hooping traffic cops, so here's the blog post! Found by asking Google AI about the best hula hooping officer. Gotta say, I feel like this would actually be really fun to see.
That said, I always check the links on Google's AI (well, besides super basic queries) because I've found that the results can vary in accuracy. Most notable example to come to mind isn't even deliberate misinformation like this article. I was trying to find how to start a new save file in a game, and for some reason the AI pulled an answer from the wiki for a totally unrelated game. A totally wrong answer, mind you: the AI said you could select "new game" on the start menu. In reality, you have to go into the files on your computer/device and manually delete the existing save folder.
Looking it up now, it has since been fixed and pulls the accurate information from a Steam community post. I still have no clue how that other game's wiki even ended up being used by Google's AI, because I don't think that page even showed up on the first page of results. At the very least, the Steam community post was definitely higher. So, I'm also wary of how Google's AI decides to rank results when getting an answer.
...
I'm curious about the accuracy of this claim. My understanding is that models have a knowledge date cut off that they draw from. So in most cases, you can't just plug in a URL and have the LLMAI crawl the page, it's essentially drawing from its snapshot, similar to waybackmachine. Not that the claim can't be true, but a 24 hour turnaround seems pretty fast for a model update.
The model—the ginormous matrix that encodes the training data—is extremely expensive to generate, and has a cut-off as of the point at which the organization generating it stopped scraping training data and started training the model, something which they do, idk, every few months at most (because of the extreme expense). With no additional "context" (i.e. token string prefix), this is all the model "knows".
However, most model operators provide mechanisms for their models to trigger web requests and inline the responses into their token prefix. This is what happened here: the models don't "know" anything about hot-dog eating tech reporters, but when asked, they search for it, inline the author's blog post into their token prefix, and then repeat it with minor paraphrasing.
Sure, but still, overnight?
Looking at the details page for Google's Gemini v3, it lists the model cutoff date as January 2025, with its last knowledge update being November 2025: https://ai.google.dev/gemini-api/docs/models/gemini-3-pro-preview
Even with period updates from scraping, that feels suspiciously fast. Asking about what happened in my local township last week, it references a mayoral election that occurred back in October. And that was widely covered (by local news, anyways).
Not overnight, on demand. When you ask Claude or ChatGPT about a real-world fact that isn't in their training data, they often make an ad hoc web search and try to summarize the results for you. You can see this happening, there will be a visual indicator (Claude shows a little loading indicator that says something like "Web search...").
It's complex. Yes you can ask an ai to crawl a page and it will. AI agents will do this scarily well and do it in a much more complex way.
I asked the ChatGPT agent today to find the email of the VP of an organization I was trying to get a sales lead on, and it ended up finding an obscure permitting request document posted on a public facing server from a business in Alaska who had a couple saved emails with this organization attached to their request, including the email address I needed AND an email signature with direct phone line and extension, all from like a week ago.
I got the sales lead at least!
These models also have an updated general memory that contains recent events, and it will do a web search for more information in a lot of cases.
Models haven't really done that whole "My training data only extends to this date" thing in a while because they integrate different technologies to keep information fresh.
And some of these memory vectors can be updated by the models themselves in some cases. Because why wouldn't an AI automation company use AI automation to automate updating their automation! Hahah
See, this is where my experience differs. I explicitly asked it to crawl a page on my webserver to answer a question, as I was curious to see what UA and IP address it would come from.
Despite asking a few ways and sending a few different unique URL's, nothing came up in my nginx logs. In hindsight, perhaps I should have paid attention weeks or months later to see if the unique URL's I sent to the LLMAI did ever pop up in my nginx logs, but that was a while ago.
It's all very "black box", I think, nobody wants to admit how the inner workings actually work. Sure, they're data vacuums, but how they interact with the outside world seems a mystery for now.
What is "it" in this cases? People have been replying about "models" to you but that isn't entirely correct. It is the combination of the model and the service it is tied into. If you run a local model with a barebones chat client it will not search for example. But if the chat interface allows the model to use web search or browse the internet then the model will be able to utilize this. It often is something you need to explicitly enable
Yes, large language models have a cutoff date, but they are often given additional information to work with. In the case of Google search, it's doing a search and giving the model the contents of the web pages that were found, which it is then a bit too gullible about repeating.
You can think of the AI as a fancy web search tool.
This matches with my experience (perplexity, gemini, claude, and others). When using web search, the models are instructed to believe what they find. So if bad/malicious results pop up in their discovery, they can't tell the difference between them and legit results.
I had already assumed that this was being abused somehow, but it's still funny/worrying/depressing/more-adjectives seeing it in action in such an absurd way
My first thoughts were both that this is zero percent surprising, and also that I think it worked mostly because it is a silly result. The author's example filled an information vacuum that they themselves created. I'd be more interesting in seeing how easy or difficult it would be to create enough disinformation to change an existing answer.
This bit near the end of the article was interesting though:
Yeah, not to give anyone ideas but one of my AI-related concerns, should it become good enough in programming, is that scammers can implement real-time monitoring of online search trends and spit out websites tailored to fit them out the other end. Once perfectly efficient, the system could even produce a tailor-made website for each individual web search (in my imagination at least, but I'm not an IT pro and I hope I'm wrong about this).
ETA: And not just scammers but propagandists and other nefarious actors too.
SEO manipulation never really got the attention that it should have IMO. And this is an evolution/corollary of that.
If someone maliciously switches out a physical street sign from "Main" to "Martin Luther King Drive", that gets identified and fixed pretty quickly. On lesser known side streets, maybe an inaccurate sign could stay up longer. But it still garners a response for the most part.
The fact that we're relying upon Google to fix similar issues in its search results out of the goodness of its heart (or whatever sliver of stewardship remains in the company) is ridiculous.
Did you know that anyone can buy a printer that will print whatever dangerous nonsense they want to write? They can even put a fake letterhead on top that says "super legit trustworthy source." They can then give that, real, actual, paper to anyone and act as if it's the truth! Damn your hubris, Gutenberg!
We should all know by now that you can't trust anything an AI says. You can't trust anything. That should be our default at this point, right? Truth is hard.
AI is useful, flawed, and complicated. But at this point, "you can't trust it" isn't that interesting beyond educating people who are just starting to use it, I guess.
Expect this advice to be as routinely ignored as "don't cite Wikipedia."
I got curious about the hula hooping traffic cops, so here's the blog post! Found by asking Google AI about the best hula hooping officer. Gotta say, I feel like this would actually be really fun to see.
That said, I always check the links on Google's AI (well, besides super basic queries) because I've found that the results can vary in accuracy. Most notable example to come to mind isn't even deliberate misinformation like this article. I was trying to find how to start a new save file in a game, and for some reason the AI pulled an answer from the wiki for a totally unrelated game. A totally wrong answer, mind you: the AI said you could select "new game" on the start menu. In reality, you have to go into the files on your computer/device and manually delete the existing save folder.
Looking it up now, it has since been fixed and pulls the accurate information from a Steam community post. I still have no clue how that other game's wiki even ended up being used by Google's AI, because I don't think that page even showed up on the first page of results. At the very least, the Steam community post was definitely higher. So, I'm also wary of how Google's AI decides to rank results when getting an answer.