Return of the AI Megathread (#13) - news of chatbots, image generators, etc - ~tech

[5]

Wes

September 13, 2023 (edited September 19, 2023)

Link

There's been lots to talk about! July was a great month. Facebook released Llama 2 under a permissive license, with many finetunes already coming out. StableAI also released Stable Diffusion XL....

There's been lots to talk about!

July was a great month. Facebook released Llama 2 under a permissive license, with many finetunes already coming out. StableAI also released Stable Diffusion XL. It's a two-pass system, so it might make sense to set it up with ComfyUI to automate the running of both models. It's quite a bit slower than 1.x and 2.x for me, but the results are extremely good.

In August, llama.cpp finally standardized on the GGUF format, so that should mean less downloading and converting of model formats. It allows metadata for suggested parameters and even the prompt format. Should really simplify things for those running llama2 and co at home. Facebook also released Code Llama, their own fine-tune on top of llama2.

Last week Falcon released a 180B model (that is, 180 billion parameters). It reaches ChatGPT 3.5 levels of intelligence on benchmarks, but is prohibitively expensive to run. Think of it more as a proof of concept, but it could prove interesting with quantization. If nothing else, it's a great benchmark.

Last night StableAI released Stable Audio, a text-to-audio generator. I've not had a chance to play around with it yet, but the samples are pretty impressive for how early that technology still is.

The media hype may have died down some, but the technology continues to evolve. Context lengths are continually growing, and the tooling is also improving so it's becoming easier and more accessible to run your own AI tools on your home PC. This is a trend I'm really glad to see. It means home assistants, coding aids, writing prompts, etc can all be run locally, on-device, and not reliant on "the cloud" to host these services for us.

e: Typo

13 votes

vczf
September 14, 2023
Link Parent
Also, the current top performing model for code is a fine-tune of codellama by Phind: https://www.phind.com/blog/code-llama-beats-gpt4

Also, the current top performing model for code is a fine-tune of codellama by Phind: https://www.phind.com/blog/code-llama-beats-gpt4

3 votes
[3]
vczf
September 14, 2023 (edited September 14, 2023)
Link Parent
The cheapest (not cheap: $4800+) way to run falcon-180B at speed right now on local hardware is a Mac Studio with llama.cpp. You need 128GB RAM to run the Q3_K_M quantization (80GB) or 192GB RAM...

The cheapest (not cheap: $4800+) way to run falcon-180B at speed right now on local hardware is a Mac Studio with llama.cpp. You need 128GB RAM to run the Q3_K_M quantization (80GB) or 192GB RAM to run up to maybe Q6_K (148GB).

You can't use all the RAM for metal (the Apple Silicon GPU backend), but it is possible to patch the VRAM split to increase the allocation limit. In theory, you could run Q4_K_M on the 128GB Mac, but I haven't tried this on mine.

My impression of falcon-180b-chat is that it definitely seems smarter than LLaMA2 on language and knowledge tasks. It can go deeper on topics. However, the chat instruct is not that good, and it tends to get confused and generate outputs for "Falcon:", "Assistant:", "User:", etc. as well as sometimes start the Alpaca instruction-response format, ultimately talking to itself for a number of conversational turns before finally generating a stop token.

I am hopeful for some new fine-tunes to choose from, but it may take a while because of the model size.

2 votes
1. [2]
  flowerdance
  September 15, 2023
  Link Parent
  Wait. Why is the cheapest option an Apple computer?
  
  Wait. Why is the cheapest option an Apple computer?
  
  1 vote
  1. vczf
    September 15, 2023
    Link Parent
    128GB Mac Studio gives you 96GB VRAM for metal and costs $4800. In comparison, an 80GB Nvidia A100 costs ~$13000 just for the GPU. It's not an apple to apples comparison (pun intended) but for the...
    
    128GB Mac Studio gives you 96GB VRAM for metal and costs $4800. In comparison, an 80GB Nvidia A100 costs ~$13000 just for the GPU.
    
    It's not an apple to apples comparison (pun intended) but for the purpose of running inference, the Mac is better value and more available.
    
    3 votes

[4]

The_Ejj

September 13, 2023

Link

Conveniently, I just came across this article on Mastadon. I think music is particularly susceptible to being overtaken by generative algorithms because it’s inherently pattern based, and has an...

Conveniently, I just came across this article on Mastadon.

I think music is particularly susceptible to being overtaken by generative algorithms because it’s inherently pattern based, and has an even tighter set of rules than language.

It’s unfortunate from a perspective of losing the humanity in arts… but I also can’t help but feel that generative music is more fair game than generative text and scripts. Replacing writers with LLMs feels lazy and insidious. Replacing musicians, especially for stock music, feels… almost practical? Certainly more inevitable in my mind.

8 votes

skybrian (OP)
September 13, 2023
Link Parent
As an amateur musician, generating music from a text description alone doesn't interest me at all. I want to give it a melody, a chord progression, a MIDI track, and/or some previous tracks, and...

As an amateur musician, generating music from a text description alone doesn't interest me at all. I want to give it a melody, a chord progression, a MIDI track, and/or some previous tracks, and have it generate another track that goes with them. (And yes, a text description is a good input, too, just not alone.)

Also, how about generating a sampled instrument from a text description?

My guess is that someday there will be a new generation of audio tools that does this.

11 votes
[2]
Amarok
September 13, 2023
Link Parent
I'd be willing to bet the best they can do is generic library muzak, in a sense. It's all copies of existing generalities in the music - this is a probability machine. It has a severe handicap -...

I'd be willing to bet the best they can do is generic library muzak, in a sense. It's all copies of existing generalities in the music - this is a probability machine. It has a severe handicap - algorithms have no emotional context from which to appreciate the music they process. We are about to get buried in remixes that are on another level, like Johnny Cash covering Barbie Girl.

Wherever the algorithm falls down, the musician will be there to take it forward. Honestly it could help a lot in the composition process, generating bits of instrumentation under human guidance. One would be able to try out a lot of things musically that way, like having a pocket orchestra where you are the conductor.

8 votes
1. public
  September 15, 2023
  Link Parent
  That's how I view the idealized future of AI art. I'm a director or museum curator, not an artist. However, it is my tastes that influence which generations are worth sharing or saving for manual...
  
  That's how I view the idealized future of AI art. I'm a director or museum curator, not an artist. However, it is my tastes that influence which generations are worth sharing or saving for manual cleanup and which are tossed back to sea.
  
  1 vote

skybrian (OP)

September 13, 2023

Link

Here's my own news: I wrote a VS Code extension called Bot Typist that lets you chat with GPT4 (and other AI bots) in a Jupyter notebook. I consider a good replacement for ChatGPT's Code...

Here's my own news: I wrote a VS Code extension called Bot Typist that lets you chat with GPT4 (and other AI bots) in a Jupyter notebook. I consider a good replacement for ChatGPT's Code Interpreter (now called "Advanced Data Analysis.")

This is built on Simon Willison's llm command line tool. It's a way to use chatbots from the command line that connects to several different bots.

8 votes

skybrian (OP)

September 13, 2023

Link

There's an app called "HeyGen" that's currently overloaded, but a friend got through and created videos of himself speaking in languages he doesn't know. Here's an article about it: How to create...

There's an app called "HeyGen" that's currently overloaded, but a friend got through and created videos of himself speaking in languages he doesn't know. Here's an article about it:

How to create your own personal deepfake (Axios)

To create a personalized avatar, customers have to send HeyGen a two-minute video of themselves speaking into a camera (your smartphone is fine) — along with another video giving consent for the company to do its thing.

HeyGen returns a digital avatar that you can use to generate videos by typing the words you want to speak into a text box. A content filter blocks explicit or violent content.

In addition to creating avatars of customer's own images, HeyGen offers a range of ready-to-use, generic-faced avatars and voices (they come in various genders and races).

7 votes

[3]

skybrian (OP)

September 13, 2023

Link

It's an unsubstantiated claim on Twitter from a VC who is probably talking her book, but it seems plausible, so for what it's worth:

The biggest secret about "AI girlfriends"?

The majority of users are female (at least for chat-based products). It mimics fan fiction, where ~80% of readers are women.

This does not hold true for image generation, where the ratio flips...

5 votes

[2]
vczf
September 14, 2023
Link Parent
I'm skeptical of her claim. AI character bots artificially fulfill an unmet social need, and are not directly comparable to fan-fiction or self-insert stories. It's plausible that the population...

I'm skeptical of her claim.

AI character bots artificially fulfill an unmet social need, and are not directly comparable to fan-fiction or self-insert stories. It's plausible that the population is balanced, or even tilted towards cishet men. Consider replika.ai.

We're all social creatures.

3 votes
1. skybrian (OP)
  September 14, 2023
  Link Parent
  Well, it’s basically a rumor, but I think it might depend on which website or app it is? For example, character.ai is text-only. It’s a new form of interactive fiction, so I see a close connection...
  
  Well, it’s basically a rumor, but I think it might depend on which website or app it is? For example, character.ai is text-only.
  
  It’s a new form of interactive fiction, so I see a close connection to non-interactive fiction like romance novels.
  
  3 votes

skybrian (OP)

September 13, 2023

Link

An older article from May. He's the author of spaCy, a previous-generating AI tool, so this is biased, but I thought it was pretty good: Against LLM maximalism (Mathew Honnibal, explosion.ai)...

An older article from May. He's the author of spaCy, a previous-generating AI tool, so this is biased, but I thought it was pretty good:

Against LLM maximalism (Mathew Honnibal, explosion.ai)

One vision for how LLMs can be used is what I’ll term LLM maximalist. If you have some task, you try to ask the LLM to do it as directly as possible. Need the data in some format? Ask for it in the prompt. Avoid breaking down your task into several steps, as that will prevent the LLM from working through your problem end-to-end. It also introduces extra calls, and can introduce errors in the intermediate processing steps.

[...]

There are two big problems with this approach. One is that “working around the system’s limitations” is often going to be outright impossible. Most systems need to be much faster than LLMs are today, and on current trends of efficiency and hardware improvements, will be for the next several years. Users are pretty tolerant of latency in chat applications, but in almost any other type of user interface, you can’t wait multiple seconds for a single prediction. [...]

The second problem is that the LLM maximalist approach is fundamentally not modular. [...]

...

What makes a good program? It’s not only how efficiently and accurately it solves a single set of requirements, but also how reliably it can be understood, changed and improved. Programs written with the LLM maximalist approach are not good under these criteria.

...

Before you can improve any statistical component, you need to be able to evaluate it. It’s important to have some evaluation over your whole pipeline, and if you have nothing else, you can use that to judge whether some change to a component is making things better or worse (this is called “extrinsic evaluation”). But you should also evaluate your components in isolation (“intrinsic evaluation”). [...]

Intrinsic evaluation is like a unit test, while extrinsic evaluation is like an integration test. You do need both. It’s very common to start building an evaluation set, and find that your ideas about how you expect the component to behave are much vaguer than you realized. You need a clear specification of the component to improve it, and to improve the system as a whole. Otherwise, you’ll end up in a local maximum: changes to one component will seem to make sense in themselves, but you’ll see worse results overall, because the previous behavior was compensating for problems elsewhere. Systems like that are very difficult to improve.

3 votes

unkz

September 14, 2023

Link

Has anyone used Waldo or other LLM-augmented search engine tools? I’ve been playing around with rolling my own for a bit now, using Claude. Curious what experiences anyone else has had with...

Has anyone used Waldo or other LLM-augmented search engine tools? I’ve been playing around with rolling my own for a bit now, using Claude. Curious what experiences anyone else has had with intelligent search agents.

1 vote

Wes

October 30, 2023

Link

Local LLMs continued to improve through the month of October. Release of Mistral 7B. It's done very well on benchmarks, even beating out llama2 13B. HuggingFace released Zephyr, a finetune of...

Local LLMs continued to improve through the month of October.

Release of Mistral 7B. It's done very well on benchmarks, even beating out llama2 13B.
HuggingFace released Zephyr, a finetune of Mistral 7B for chatting. Various other finetunes also exist.
llama.cpp gains some multimodal support.
A detailed comparison between GPTQ, AWQ, EXL2, q4_K_M, q4_K_S, and load_in_4bit: perplexity, VRAM, speed, model size, and loading time, a comparison of quantization techniques by oobabooga.

1 vote

skybrian (OP)

September 15, 2023

Link

Google nears release of AI software Gemini, The Information reports Not much to this, but I guess we will find out more soon.

Google nears release of AI software Gemini, The Information reports

Not much to this, but I guess we will find out more soon.