28
votes
Return of the AI Megathread (#13) - news of chatbots, image generators, etc
I haven't done one of these since early July, but it seems like there's an uptick in news. Here's the previous one.
I haven't done one of these since early July, but it seems like there's an uptick in news. Here's the previous one.
There's been lots to talk about!
July was a great month. Facebook released Llama 2 under a permissive license, with many finetunes already coming out. StableAI also released Stable Diffusion XL. It's a two-pass system, so it might make sense to set it up with ComfyUI to automate the running of both models. It's quite a bit slower than 1.x and 2.x for me, but the results are extremely good.
In August, llama.cpp finally standardized on the GGUF format, so that should mean less downloading and converting of model formats. It allows metadata for suggested parameters and even the prompt format. Should really simplify things for those running llama2 and co at home. Facebook also released Code Llama, their own fine-tune on top of llama2.
Last week Falcon released a 180B model (that is, 180 billion parameters). It reaches ChatGPT 3.5 levels of intelligence on benchmarks, but is prohibitively expensive to run. Think of it more as a proof of concept, but it could prove interesting with quantization. If nothing else, it's a great benchmark.
Last night StableAI released Stable Audio, a text-to-audio generator. I've not had a chance to play around with it yet, but the samples are pretty impressive for how early that technology still is.
The media hype may have died down some, but the technology continues to evolve. Context lengths are continually growing, and the tooling is also improving so it's becoming easier and more accessible to run your own AI tools on your home PC. This is a trend I'm really glad to see. It means home assistants, coding aids, writing prompts, etc can all be run locally, on-device, and not reliant on "the cloud" to host these services for us.
e: Typo
Also, the current top performing model for code is a fine-tune of codellama by Phind: https://www.phind.com/blog/code-llama-beats-gpt4
The cheapest (not cheap: $4800+) way to run falcon-180B at speed right now on local hardware is a Mac Studio with llama.cpp. You need 128GB RAM to run the Q3_K_M quantization (80GB) or 192GB RAM to run up to maybe Q6_K (148GB).
You can't use all the RAM for metal (the Apple Silicon GPU backend), but it is possible to patch the VRAM split to increase the allocation limit. In theory, you could run Q4_K_M on the 128GB Mac, but I haven't tried this on mine.
My impression of falcon-180b-chat is that it definitely seems smarter than LLaMA2 on language and knowledge tasks. It can go deeper on topics. However, the chat instruct is not that good, and it tends to get confused and generate outputs for "Falcon:", "Assistant:", "User:", etc. as well as sometimes start the Alpaca instruction-response format, ultimately talking to itself for a number of conversational turns before finally generating a stop token.
I am hopeful for some new fine-tunes to choose from, but it may take a while because of the model size.
Wait. Why is the cheapest option an Apple computer?
128GB Mac Studio gives you 96GB VRAM for metal and costs $4800. In comparison, an 80GB Nvidia A100 costs ~$13000 just for the GPU.
It's not an apple to apples comparison (pun intended) but for the purpose of running inference, the Mac is better value and more available.
Conveniently, I just came across this article on Mastadon.
I think music is particularly susceptible to being overtaken by generative algorithms because it’s inherently pattern based, and has an even tighter set of rules than language.
It’s unfortunate from a perspective of losing the humanity in arts… but I also can’t help but feel that generative music is more fair game than generative text and scripts. Replacing writers with LLMs feels lazy and insidious. Replacing musicians, especially for stock music, feels… almost practical? Certainly more inevitable in my mind.
As an amateur musician, generating music from a text description alone doesn't interest me at all. I want to give it a melody, a chord progression, a MIDI track, and/or some previous tracks, and have it generate another track that goes with them. (And yes, a text description is a good input, too, just not alone.)
Also, how about generating a sampled instrument from a text description?
My guess is that someday there will be a new generation of audio tools that does this.
I'd be willing to bet the best they can do is generic library muzak, in a sense. It's all copies of existing generalities in the music - this is a probability machine. It has a severe handicap - algorithms have no emotional context from which to appreciate the music they process. We are about to get buried in remixes that are on another level, like Johnny Cash covering Barbie Girl.
Wherever the algorithm falls down, the musician will be there to take it forward. Honestly it could help a lot in the composition process, generating bits of instrumentation under human guidance. One would be able to try out a lot of things musically that way, like having a pocket orchestra where you are the conductor.
That's how I view the idealized future of AI art. I'm a director or museum curator, not an artist. However, it is my tastes that influence which generations are worth sharing or saving for manual cleanup and which are tossed back to sea.
Here's my own news: I wrote a VS Code extension called Bot Typist that lets you chat with GPT4 (and other AI bots) in a Jupyter notebook. I consider a good replacement for ChatGPT's Code Interpreter (now called "Advanced Data Analysis.")
This is built on Simon Willison's llm command line tool. It's a way to use chatbots from the command line that connects to several different bots.
There's an app called "HeyGen" that's currently overloaded, but a friend got through and created videos of himself speaking in languages he doesn't know. Here's an article about it:
How to create your own personal deepfake (Axios)
It's an unsubstantiated claim on Twitter from a VC who is probably talking her book, but it seems plausible, so for what it's worth:
I'm skeptical of her claim.
AI character bots artificially fulfill an unmet social need, and are not directly comparable to fan-fiction or self-insert stories. It's plausible that the population is balanced, or even tilted towards cishet men. Consider replika.ai.
We're all social creatures.
Well, it’s basically a rumor, but I think it might depend on which website or app it is? For example, character.ai is text-only.
It’s a new form of interactive fiction, so I see a close connection to non-interactive fiction like romance novels.
An older article from May. He's the author of spaCy, a previous-generating AI tool, so this is biased, but I thought it was pretty good:
Against LLM maximalism (Mathew Honnibal, explosion.ai)
[...]
...
...
Has anyone used Waldo or other LLM-augmented search engine tools? I’ve been playing around with rolling my own for a bit now, using Claude. Curious what experiences anyone else has had with intelligent search agents.
Local LLMs continued to improve through the month of October.
Google nears release of AI software Gemini, The Information reports
Not much to this, but I guess we will find out more soon.