-
33 votes
-
The dangers of vibe coding
26 votes -
Nintendo President on the new Switch 2, tariffs and what's next for the company
17 votes -
Anubis works
35 votes -
AI 2027
29 votes -
Why do AI company logos look like buttholes?
58 votes -
The art of poison-pilling music files
15 votes -
I'm tired of dismissive anti-AI bias
60 votes -
Fintech founder charged with fraud after ‘AI’ shopping app found to be powered by humans in the Philippines
39 votes -
An image of an archeologist adventurer who wears a hat and uses a bullwhip
43 votes -
Microsoft launches generative AI-powered, Quake II “inspired” tech demo
19 votes -
Google AI search shift leaves website makers feeling “betrayed”
36 votes -
Blackhat hacker 'EncryptHub' behind vibe-coded ransomware unmasked due to opsec mistakes in ChatGPT-created infrastructure
20 votes -
How AI is powering the Boston Red Sox on the field and across operations
4 votes -
Young Chinese reimagine the last goodbye - new, personalised funerals in China struggle to break through culture
4 votes -
The ARC-AGI-2 benchmark could help reframe the conversation about AI performance in a more constructive way
The popular online discourse on Large Language Models’ (LLMs’) capabilities is often polarized in a way I find annoying and tiresome. On one end of the spectrum, there is nearly complete dismissal...
The popular online discourse on Large Language Models’ (LLMs’) capabilities is often polarized in a way I find annoying and tiresome.
On one end of the spectrum, there is nearly complete dismissal of LLMs: an LLM is just a slightly fancier version of the autocomplete on your phone’s keyboard, there’s nothing to see here, move on (dot org).
This dismissive perspective overlooks some genuinely interesting novel capabilities of LLMs. For example, I can come up with a new joke and ask ChatGPT to explain why it’s funny or come up with a new reasoning problem and ask ChatGPT to solve it. My phone’s keyboard can’t do that.
On the other end of the spectrum, there are eschatological predictions: human-level or superhuman artificial general intelligence (AGI) will likely be developed within 10 years or even within 5 years, and skepticism toward such predictions is “AI denialism”, analogous to climate change denial. Just listen to the experts!
There are inconvenient facts for this narrative, such as that the majority of AI experts give much more conservative timelines for AGI when asked in surveys and disagree with the idea that scaling up LLMs could lead to AGI.
The ARC Prize is an attempt by prominent AI researcher François Chollet (with help from Mike Knoop, who apparently does AI stuff at Zapier) to introduce some scientific rigour into the conversation. There is a monetary prize for open source AI systems that can perform well on a benchmark called ARC-AGI-2, which recently superseded the ARC-AGI benchmark. (“ARC” stands for “Abstract and Reasoning Corpus”.)
ARC-AGI-2 is not a test of whether an AI is an AGI or not. It’s intended to test whether AI systems are making incremental progress toward AGI. The tasks the AI is asked to complete are colour-coded visual puzzles like you might find in a tricky puzzle game. (Example.) The intention is to design tasks that are easy for humans to solve and hard for AI to solve.
The current frontier AI models score less than 5% on ARC-AGI-2. Humans score 60% on average and 100% of tasks have been solved by at least two humans in two attempts or less.
For me, this helps the conversation about AI capabilities because it gives a rigorous test and quantitative measure to my casual, subjective observations that LLMs routinely fail at tasks that are easy for humans.
François Chollet was impressed when OpenAI’s o3 model scored 75.7% on ARC-AGI (the older version of the benchmark). He emphasizes the concept of “fluid intelligence”, which he seems to define as the ability to adapt to new situations and solve novel problems. Chollet thinks that o3 is the first AI system to demonstrate fluid intelligence, although it’s still a low level of fluid intelligence. (o3 also required thousands of dollars’ worth of computation to achieve this result.)
This is the sort of distinction that can’t be teased out by the polarized popular discourse. It’s the sort of nuanced analysis I’ve been seeking out, but which has been drowned out by extreme positions on LLMs that ignore inconvenient facts.
I would like to see more benchmarks that try to do what AGI-AGI-2 does: find problems that humans can easily solve and frontier AI models can’t solve. These sort of benchmarks can help us measure AGI progress much more usefully than the typical benchmarks, which play to LLMs’ strengths (e.g. massive-scale memorization) and don’t challenge them on their weaknesses (e.g. reasoning).
I long to see AGI within my lifetime. But the super short timeframes given by some people in the AI industry feel to me like they border on mania or psychosis. The discussion is unrigorous, with people pulling numbers out of thin air based on gut feeling.
It’s clear that there are many things humans are good at doing that AI can’t do at all (where the humans vs. AI success rate is ~100% vs. ~0%). It serves no constructive purpose to ignore this truth and it may serve AI research to develop rigorous benchmarks around it.
Such benchmarks will at least improve the quality of discussion around AI capabilities, insofar as people pay attention to them.
Update (2024-04-11 at 19:16 UTC): François Chollet has a new 20-minute talk on YouTube that I recommend. I've watched a few videos of Chollet talking about ARC-AGI or ARC-AGI-2, and this one is beautifully succinct: https://www.youtube.com/watch?v=TWHezX43I-4
10 votes -
Immune ‘fingerprints’ aid diagnosis of complex diseases in Stanford Medicine study
6 votes -
US scientists are using machine learning to find new treatments among thousands of old medicines
12 votes -
Using Claude and undocumented Google Calendar features to automate event creation
4 votes -
Swedish fashion retailer H&M will use AI doppelgangers in some social media posts and marketing in the place of humans, if given permission by models
10 votes -
Vibe coding on Apple Shortcuts
5 votes -
New breakthrough in AI cancer detection is pushing accuracy levels to an unprecedented 99%
23 votes -
A summary of my bot defence systems
11 votes -
Review: Cræft, by Alexander Langlands
4 votes -
Please stop externalizing your costs directly into my face
121 votes -
Enough with the bullshit (a letter to fellow bullshit sufferers)
56 votes -
Trapping misbehaving bots in an AI Labyrinth
40 votes -
eBay privacy policy update and AI opt-out
eBay is updating its privacy policy, effective next month (2025-04-27). The major change is a new section about AI processing, accompanied by a new user setting with an opt-out checkbox for having...
eBay is updating its privacy policy, effective next month (2025-04-27). The major change is a new section about AI processing, accompanied by a new user setting with an opt-out checkbox for having your personal data feed their models.
While that page specifically references European areas, the privacy selection appears to be active and remembered between visits for non-Europe customers. It may not do anything for us at all. On the other hand, it seems nearly impossible to find that page from within account settings, so I thought I'd post a direct link.
I'm well aware that I'm anomalous for having read this to begin with, much less diffed it against the previous version. But since I already know that I'm weird, and this wouldn't be much of a discussion post without questions:
- How do you stay up to date with contract changes that might affect you, outside of widespread Internet outrage (such as recent Firefox news)?
- What's your threshold -- if any -- for deciding whether to quit a company over contract changes? Alternatively, have you ever walked away from a purchase, service, or other acquisition over the terms of the contracts?
46 votes -
Norwegian man has filed a complaint with the Norwegian Data Protection Authority after ChatGPT falsely told him he had killed two of his sons and been jailed
22 votes -
Claude can now search the web
17 votes -
Block AI scrapers with Anubis
27 votes -
FOSS infrastructure is under attack by AI companies
39 votes -
Generative AI tool marks a milestone in biology - Evo 2 can predict the form and function of proteins in the DNA of all domains of life
29 votes -
LLM crawlers continue to DDoS SourceHut
11 votes -
The lo-fi art and human tools era
10 votes -
Professional writer endorses short story written by OpenAI's new creative writing model
18 votes -
What one Finnish church learned from creating a service almost entirely with AI – tools wrote the sermons and some of the songs, composed the music and created some the visuals
11 votes -
(715) 999-7483 - A phone-powered multiplayer website builder
32 votes -
Mayo Clinic's secret weapon against AI hallucinations: Reverse RAG in action
8 votes -
Factorio Learning Environment – a benchmark that tests agents in long-term planning, program synthesis, and resource optimization
13 votes -
Nexus: A Brief History of Information Networks from the Stone Age to AI by Yuval Noah Harari
3 votes -
Show Tildes: we built the world's first legal AI API
22 votes -
I used to teach students. Now I catch ChatGPT cheats.
53 votes -
Bartosz Milewski - Understanding Attention in LLMs
6 votes -
Is it wrong to use AI to fact check and combat the spread of misinformation?
I’ve been wondering about this lately. Recently, I made a post about Ukraine on another social media site, and someone jumped in with the usual "Ukraine isn't a democracy" right-wing talking...
I’ve been wondering about this lately.
Recently, I made a post about Ukraine on another social media site, and someone jumped in with the usual "Ukraine isn't a democracy" right-wing talking point. I wrote out a long, thoughtful reply, only to get the predictable one-liner propaganda responses back. You probably know the type, just regurgitated stuff with no real engagement.
After that, I didn’t really feel like spending my time and energy writing out detailed replies to every canned response. But I also didn’t want to just let it sit there and have people who might be reading the exchange assume there’s no pushback or correction.
So instead, I tried leveraging AI to help me write a fact-checking reply. Not for the person I was arguing with, really, but more as an FYI for anyone else following along. I made sure it stayed factual and based in reality, avoided name-calling, and kept the tone above the usual mudslinging. And of course, I double-checked what it wrote to make sure it matched my understanding and wasn’t just spitting out garbage or hallucinations.
But it got me thinking that there’s a lot of fear about AI being used to spread and create misinformation. But do you think there’s also an opportunity to use it as a tool to counter misinformation, without burning ourselves out in the process?
Curious how others see it.
16 votes -
Melbourne start-up launches 'biological computer' made of human brain cells
9 votes -
The NotaGen sheet music generator
8 votes -
Is there one AI product you would recommend over another to a complete newbie? The primary task is writing.
So I have heard/read that LLMs available to the public can be useful for generating tailored cover letters more quickly. I've up to now avoided using artificial intelligence. What recommendations...
So I have heard/read that LLMs available to the public can be useful for generating tailored cover letters more quickly. I've up to now avoided using artificial intelligence. What recommendations do you have and do you have any advice for getting up to speed?
Thank you.
11 votes -
MIT’s new AI-powered tool accelerates startup ambitions
6 votes -
AI chatbots are people, too. (Except they’re not.)
10 votes