Megathread #8 for news/updates/discussion of AI chatbots and image generators
The hype seems to be dying down a bit? But I still find things to post. Here is the previous thread.
The hype seems to be dying down a bit? But I still find things to post. Here is the previous thread.
Google (researcher): "We Have No Moat, And Neither Does OpenAI"
An allegedly leaked internal document from a Google researcher talking about how open source models are eating their (and OpenAIs) lunch. And because this is all being done on top of the ‘leaked’ LLaMA, Meta stands to benefit the most.
I'm curious if Facebook will ultimately relicense LLaMA to make sure their platform is the universal one. They must be aware that the OSS competitors are catching up quickly. They also have a history of relicensing under public pressure, as happened with React.
They relicensed EnCodec the other day, which is a dependency for a decent number of ML audio projects, so there's definitely recent precedent too.
That's led to bark switching to an MIT license this week, and I think it opens up at least one of the two VALL-E implementations I'm aware of as well - I have no idea how much is long term business 4D chess and how much is devs being devs, but it's good to see either way.
I’ll repost my Hacker News comment:
This gets attention due to being a leak, but it’s still just one Googler’s opinion and it has signs of being overstated for rhetorical effect.
In particular, demos aren’t the same as products. Running a demo on one person’s phone is an important milestone, but if the device overheats and/or gets throttled then it’s not really something you’d want to run on your phone.
It’s easy to claim that a problem is “solved” with a link to a demo when actually there’s more to do. People can link to projects they didn’t actually investigate. They can claim “parity” because they tried one thing and were impressed. Figuring out if something works well takes more effort. Could you write a product review, or did you just hear about it, or try it once?
I haven’t investigated most projects either so I don’t know, but consider that things may not be moving quite as fast as demo-based hype indicates.
Yep. For example, the paper says "Scalable Personal AI: You can finetune a personalized AI on your laptop in an evening." That links to the Alpaca-LoRA repository. Having a personalized AI sounds awesome, but I have no idea how to get there from here, or even what specifically a "personalized AI" means. Like, if I could train an AI on the particular programming languages and packages I use without having to shell out for GPT-4, that would be great, but just linking to the repository is a long way from proving their claims.
If you ask a chatbot why it wrote what it did, it has no idea, so it makes something up. It turns out there's no guarantee you will see its real thought process if you ask it to "think out loud" either. (This is called "chain-of-thought" reasoning.)
The researchers tested this using multiple choice questions where they bias the model. For example, they might bias it to believe that the answer is always A. The bot would never say that it noticed the pattern and that's why it picks A. It would pretend to think out loud in a way that results in picking A.
They tested with GPT-3.5 and Claude.
Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting
(disclosure, only read the abstract and skimmed the paper)
That pretty cool, as this is not dissimilar to how humans work. We often rationalize an explanation after we've already made it. This likely keeps cognitive dissonance down, but I think can also be explained because we grow the net of possible explanations once we elevate something from the unconscious to the conscious.
It's so interesting that this current trend of AI research can be studied not only through a computer science lens, but also a psychological one. It's going to be really interesting once we start putting them in large groups, and seeing what sociological principles emerge...
A Catalog of “AI” Art Analogies From photography to stochastic parrot. The list also includes the usefulness and limitation of each analogy.
It’s nicely done, thanks for sharing.
Other metaphors to be added: “bullshit” (disregard for the truth, often for self-serving reasons) versus “brainstorming” (disregard for the truth, done intentionally for creative reasons).
This company adopted AI. Here's what happened to its human workers (Planet Money)
Midjourney is testing version 5.1 and I've been playing around with it. They say it's more opinionated, though there's a way to turn that off. I tried it out a bit, and I'm finding that it gives impressive results but tends to ignore the style you give it.
Here's an example.
I've been wanting to try out MidJourney, but the last time I looked into it, it required you to join a Discord server and interact with a bot. I found that workflow very clunky. For a service you need to pay for, I expected something a little better.
The results I see posted online are very impressive though. MJ does seem to have a more opinionated style, even before this update. The renditions are more cohesive than what I've seen in other models.
Yes, it’s a bit clunky, but you can use direct messages with MidJourney’s bot and it’s not so bad. It’s basically a command-line interface that can display pictures and links and buttons.
MidJourney has a website with a gallery of the images you generated, but it’s buggy. I need to log in twice since the first login fails. New images often don’t show up there and I think it’s due to caching, but also, some images never seem to show up there.
Still, the results are good enough that I don’t bother to do comparisons with Dall-E or stable diffusion (dream studio) much.
It’s not really getting better along some dimensions. One test I do is drawing a piano keyboard, and it still gets the number of black keys wrong. It’s still rubbish at drawing accordions. The pictures are much nicer when it it works, though.
I hate the interface but boy is it addicting!
Here's a somewhat overly-excited blog post from someone who has early access to GPT-4 with plugins:
Has anyone used GPT-4? On a whim, I started using the free ChatGPT to learn something at work that I’m wholly unfamiliar with (Apache/SQLAlchemy/Pandoc) to create a customized report for my nightly CDash builds, and it’s been… mostly useful? Like, … maybe more useful than a Google search?
I feel as if it does a good job interpreting my lack of contextual knowledge (I.e., why Google searches would be hard: I don’t know enough of the terminology), but anytime I try to really drill down for specific examples, it falls apart. That plus it hallucinated >5 open-source libraries doing what I am building.
Wondering if GPT-4 is a significant upgrade / worth the subscription?
It might be a little better, but you should still expect GPT-4 to hallucinate when you drill down into specifics. If your question can't reasonably be found in the training data (because it's too specific, too obscure, or too recent), then the LLM will make more of an effort to fill in the gaps.
Understanding how to use LLMs effectively is knowing when they're at their limit. They're great for breadth, and if the topic is well-trodden enough, they can often handle depth. But more skepticism is needed once you start drilling down. You need to verify anything it spits out at you.
That said, the tools you listed are very well-known, and should have lots of historical information, so I would expect pretty good coverage. I'm actually a little surprised it's hallucinating as often as it is for you.
I have not used it myself, but this paper is a) quite interesting and b) also does a good job of highlighting the difference in capability between GPT 3.5 and
(a non-powered down version of) 4.
I’ve tried it a bit. It’s significantly slower and for simple queries you probably won’t notice a difference. I’ve found it better at writing code, but waiting for it to rewrite code with a bugfix is tedious.
But I don’t use ChatGPT day to day and haven’t tried it that much.
I've been able to solve programming problems with GPT-4 that 3.5 got stuck on. It's better at reasoning, plans out its responses better, and it seems to hallucinate less. I use it as a programming assistant almost every day and I think it's worth the money.
And you're correct: ChatGPT is most useful when you're a beginner and you don't know what questions to Google. The more information there is out there about the topic you're asking about, the better answers ChatGPT will give and the less likely it is to hallucinate. Once you really start to get into the weeds, ChatGPT will be less useful. But it will largely help you skip over the phase of being a total newbie who doesn't know how anything works.
For specifics I would recommend giving it more context to avoid hallucination. GPT-4 can handle a lot of context, so you should be able to give it an entire file and then ask for a suggestion on how to implement a change. And whenever possible tell it what libraries you'll planning to pull in if they're not already referenced in your code. There is still a level where you have very specific requirements and you're doing something that hasn't already been done a million times before where GPT just can not be of any use.
Jsonformer: A Bulletproof Way to Generate Structured JSON from Language Models
It looks like it even goes further than that? When generating a token, the LLM calculates the probability for the next token to come up. But sometimes only some tokens are allowed syntactically. So, set the probability to zero for those, and it will never generate an illegal token. (The code uses logits, so zero probability maps to -Infinity.)
This reminds me a bit of randomly generating test data using quickcheck-style testing, where you give a set of items to pick randomly from. Except that it's not picking randomly, but based on the LLM's calculated probabilities.
(Via Simon Willison)