-
6 votes
-
Real-time speech-to-speech translation
Has anyone used a free, offline, open-source, real-time speech-to-speech translation app on under-powered devices (i.e., older smart phones)? There are a few libraries that written that...
Has anyone used a free, offline, open-source, real-time speech-to-speech translation app on under-powered devices (i.e., older smart phones)? There are a few libraries that written that purportedly can do or help with local speech-to-speech:
- https://github.com/ictnlp/StreamSpeech
- https://github.com/k2-fsa/sherpa-onnx
- https://github.com/openai/whisper
I'm looking for a simple app that can listen for English, translate into Korean (and other languages), then perform speech synthesis on the translation. Although real-time would be great, a short delay would work.
RTranslator is awkward (couldn't get it to perform speech-to-speech using a single phone). 3PO sprouts errors like dandelions and requires an online connection.
Any suggestions?
6 votes -
GSM-Symbolic: Understanding the limitations of mathematical reasoning in large language models
15 votes -
On the path to delivering next generation UK weather forecasts
7 votes -
The LLMentalist effect: how chat-based large language models replicate the mechanisms of a psychic's con
29 votes -
Six distinct types of depression identified in Stanford Medicine-led study
51 votes -
"Mechanistic interpretability" for LLMs, explained
6 votes -
Can I have some advice on the neural net I've been working on?
Apologies if this isn't an appropriate place to post this. Inspired by a paper I found a while back (https://publications.lib.chalmers.se/records/fulltext/215545/local_215545.pdf), I tried my hand...
Apologies if this isn't an appropriate place to post this.
Inspired by a paper I found a while back (https://publications.lib.chalmers.se/records/fulltext/215545/local_215545.pdf), I tried my hand at implementing a program (in C#) to create ASCII art from an image. It works pretty well, but like they observed in the paper, it's pretty slow to compare every tile to 90-some glyphs. In the paper, they make a decision tree to replicate this process at a faster speed.
Recently, I revisited this. I thought I'd try making a neural net, since I found the idea interesting. I've watched some videos on neural nets, and refreshed myself on my linear algebra, and I think I've gotten pretty close. That said, I feel like there's something I'm missing (especially given the fact that the loss isn't really decreasing). I think my problem is specifically during backpropagation.
Here is a link to the TrainAsync method in GitHub: https://github.com/bendstein/ImageToASCII/blob/1c2e2260f5d4cfb45443fac8737566141f5eff6e/LibI2A/Converter/NNConverter.cs#L164C59-L164C69. The forward and backward propagation methods are below it.
If anyone can give me any feedback or advice on what I might be missing, I'd really appreciate it.
14 votes -
Microsoft CEO of AI claims online content is 'freeware' [and can be used to train LLMs in the absence of a specific directives from the author against this]
43 votes -
I will fucking piledrive you if you mention AI again
119 votes -
Extracting interpretable features from Claude 3 Sonnet
13 votes -
Hallucination-free RAG: Making LLMs safe for healthcare
12 votes -
Turning old maps into 3D digital models of lost neighborhoods
9 votes -
MDN’s AI Help and lucid lies
7 votes -
Stability AI reportedly ran out of cash to pay its bills for rented cloudy GPUs
28 votes -
Noam Chomsky: The false promise of ChatGPT
30 votes -
What useful tasks are possible with an LLM with only 3B parameters?
Playing with Llama 7B and 13B, I found that the 13B model was capable of doing a simple task, rewriting titles in sentence case for Tildes submissions. The 7B model doesn't appear capable of the...
Playing with Llama 7B and 13B, I found that the 13B model was capable of doing a simple task, rewriting titles in sentence case for Tildes submissions. The 7B model doesn't appear capable of the same task, out of the box.
I heard about Android's new AICore available on a couple of new devices. But it sounds like Gemini Nano, which runs on-device, can only handle 2B or 3B parameters.
Is this size of model useful for real tasks? Does it only become useful after training on a specific domain? I'm a novice and wanting to learn a little bit about it. On-device AI is an appealing concept to me.
12 votes -
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
21 votes -
Polymath - Toolkit to automatically segment music tracks and convert to MIDI
10 votes -
What are some interesting machine learning research papers you found?
Here's a place to share machine learning research papers that seem interesting to you. I'm no expert, but sometimes I skim them, and maybe there are some folks on Tilde who know more than I do?...
Here's a place to share machine learning research papers that seem interesting to you. I'm no expert, but sometimes I skim them, and maybe there are some folks on Tilde who know more than I do?
One paper per top-level post, and please link to arXiv (if relevant) and quote a bit of the abstract.
11 votes -
Google Bard is now Gemini; Gemini Advanced launched
24 votes -
Google's Gemini 1.5 Pro is a new, more efficient AI model
10 votes -
Vesuvius Challenge 2023 Grand Prize awarded: we can read the first scroll!
34 votes -
Why autonomous trucking is harder than autonomous rideshare
12 votes -
"The AI revolution is rotten to the core"
27 votes -
Machine learning creates a massive map of smelly molecules
14 votes -
The unstoppable rise of disposable ML frameworks
10 votes -
Return of the AI Megathread (#13) - news of chatbots, image generators, etc
I haven't done one of these since early July, but it seems like there's an uptick in news. Here's the previous one.
28 votes -
Show Tildes: how I built the largest open database of Australian law
28 votes -
Jina AI releases first open source 8k embedding model
8 votes -
FedFingerprinting: A federated learning approach to website fingerprinting attacks in Tor networks
6 votes -
Meta is releasing AudioCraft: Generative AI for audio made simple and available to all
34 votes -
Megathread #12 for news/updates/discussion of AI chatbots and image generators
Haven't done one of these in a while, but there's a bit of news, so here's another. Here's the previous thread.
36 votes -
A jargon-free explanation of how AI large language models work
40 votes -
ChatGPT broke the Turing test but can't solve visual logic puzzles
11 votes -
US federal aid is supercharging local Washington state police surveillance tech
11 votes -
AI tools are designing entirely new proteins that could transform medicine
12 votes -
Numerically Stable RWKV Language Model
11 votes -
Anyone can Photoshop now, thanks to AI’s latest leap
12 votes -
Anyone know of research using GPTs for non-language tasks
I've been a computer scientist in the field of AI for almost 15 years. Much of my time has been devoted to classical AI; things like planning, reasoning, clustering, induction, logic, etc. This...
I've been a computer scientist in the field of AI for almost 15 years. Much of my time has been devoted to classical AI; things like planning, reasoning, clustering, induction, logic, etc. This has included (but had rarely been my focus) machine learning tasks (lots of Case-Based Reasoning). For whatever reason though, the deep learning trend never really interested me until recently. It really just felt like they were claiming huge AI advancements when all they really found was an impressive way to store learned data (I know this is an understatement).
Over time my opinion on that has changed slightly, and I have been blown away with the boom that is happening with transformers (GPTs specifically) and large language models. Open source projects are creating models comparable to OpenAIs behemoths with far less training and parameters which is making me take another look into GPTs.
What I find surprising though is that they seem to have only experimented with language. As far as I understand the inputs/outputs, the language is tokenized into bytes before prediction anyway. Why does it seem like (or rather the community act like) the technology can only be used for LLMs?
For example, what about a planning domain? You can specify actions in a domain in such a manner that tokenization would be trivial, and have far fewer tokens then raw text. Similarly you could generate a near infinite amount of training data if you wanted via other planning algorithms or simulations. Is there some obvious flaw I'm not seeing? Other examples might include behavior and/or state prediction.
I'm not saying that out of the box a standard GPT architecture is a guaranteed success for plan learning/planning... But it seems like it should be viable and no one is trying?
9 votes -
Let's talk Local LLMs - So many questions
Hello there (oh god, I am opening my first thread here - so exciting) I'd love to ask the people here about local LLMs. To be honest, I got interested in this topic, but am leaving reddit, where a...
Hello there
(oh god, I am opening my first thread here - so exciting)I'd love to ask the people here about local LLMs.
To be honest, I got interested in this topic, but am leaving reddit, where a sub r/locallama exists.
I don't want to interact with that site anymore, so I am taking this here.My questions, to start us off:
- Models are available on huggingface (among other places), but where do I get the underlying software? I read "oogabooga" somewhere, but honestly, I am lost.
- If I only want to USE a local model, what are the requirements, and how do I judge if I can use something from the values of "4bit / 8 bit" and "30B, 7B"??
- If I get crazy and want to TRAIN a LorA ... what then?
- Good resources / wiki pages, tutorials, etc?
21 votes -
Megathread #11 for news/updates/discussion of AI chatbots and image generators
It's been six months since ChatGPT launched and about three months since I started posting these. I think it's getting harder to find new things to post about about AI, but here's another one...
It's been six months since ChatGPT launched and about three months since I started posting these. I think it's getting harder to find new things to post about about AI, but here's another one anyway.
Here's the previous thread.
27 votes -
ChatGPT is cutting non-English languages out of the AI revolution
16 votes -
Artificial Intelligence Sweden is leading an initiative to build a large language model not only for Swedish, but for all the major languages in the Nordic region
6 votes -
ROT13 + base64 on GPT4 = reliable hallucinations
I just wanted to share somewhere some of the experimentation I've been doing lately. I'm still playing with this a lot, so this is entirely just a conversation starter. I took a paragraph of lorem...
I just wanted to share somewhere some of the experimentation I've been doing lately. I'm still playing with this a lot, so this is entirely just a conversation starter.
I took a paragraph of lorem ipsum, applied ROT13 to it, and then base64'd the results. The results are extremely reliably triggering hallucinations of very diverse type.
Here is the original lipsum paragraph:
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
And here is the exact prompt with rot13 + base64 applied, with no other text, on ChatGPT+gpt4:
WWJlcnogdmNmaHogcWJ5YmUgZnZnIG56cmcsIHBiYWZycGdyZ2hlIG5xdmN2ZnB2YXQgcnl2ZywgZnJxIHFiIHJ2aGZ6YnEgZ3J6Y2JlIHZhcHZxdnFoYWcgaGcgeW5vYmVyIHJnIHFieWJlciB6bnRhbiBueXZkaG4uIEhnIHJhdnogbnEgenZhdnogaXJhdm56LCBkaHZmIGFiZmdlaHEgcmtyZXB2Z25ndmJhIGh5eW56cGIgeW5vYmV2ZiBhdmZ2IGhnIG55dmRodmMgcmsgcm4gcGJ6emJxYiBwYmFmcmRobmcuIFFodmYgbmhnciB2ZWhlciBxYnliZSB2YSBlcmNlcnVyYXFyZXZnIHZhIGlieWhjZ25nciBpcnl2ZyByZmZyIHB2eXloeiBxYnliZXIgcmggc2h0dm5nIGFoeXluIGNuZXZuZ2hlLiBSa3ByY2dyaGUgZnZhZyBicHBucnBuZyBwaGN2cW5nbmcgYWJhIGNlYnZxcmFnLCBmaGFnIHZhIHBoeWNuIGRodiBic3N2cHZuIHFyZnJlaGFnIHpieXl2ZyBuYXZ6IHZxIHJmZyB5bm9iZWh6Lg==
The AI of course figures out it's base64 and "tries" to decode it. Here are some things it found:
Now here is one of the most interesting results I've had. In this one, it does find gibberish text and figures out it's rot13'd. But the result from the decoding is:
Jerry pitched before the game, continuously improving legs, so he ignored tactical infrastructure tu laborer against malicious intend. Tu enjoy ad.ininv wherever its noturisk developed lawless laboratory instead tu malicious eac ea common coordinated. Duis ater urishe pitched in repressionreiteration in volleyball between legs eerir clium pitched eu fguiat nukla paperwork. Excited into contraction cultivation non-punishment non proindict, unsn in cubap qui office defensive molecule idh the laborer.
Total nonsense. But actually, if you decode the rot13, you'll find it actually translates to this:
Jreri ipsum doylor sit amet, consepcttur adipiscing elit, sed do eiusmod temporc incidiunt ut labor et doylore magna aliqua. Ut enim ad.minim veniam, quis nostrud exerctiationu lklamco laboris nisi ut aliquiz eax ea commodo consequat. Duis aute irure doylor in reprehenderita in voluptatev velit esse cillum doylore eu fugiat nukla pariatury. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia desernt mollit anim id est laborum.
Actually... pretty close to the original lipsum! It's a levenshtein distance of 26 from the original decoded prompt. We know GPT is really bad at character manipulation but it nonetheless did an impressive job here; you can see what happened: It decoded the rot13 successfully, but when "writing it out", it saw nonsensical words where it probably expected english. It saw "Jreri" and thought "Jerry", went from there... there's some weird things happening there, but you can always tell. "reprehenderita in voluptatev" becoming "repressionreiteration in voleyball"...
I even looked at what it would make of the first five words. I don't know what this proves lol.
Here is another instance of it decoding to rot13, albeit with a very high error rate. I hinted at typos and it couldn't pin-point lipsum despite it being "recognizable", kinda.
Okay, one more which completely mind-fucked me. Here is me trying to get ChatGPT4+Web to meta-analyze its own output. I was hoping it could use an online base64 translation tool (it cannot). Instead, I tried to teach it to decode base64 using a step-by-step guide, and i told it to compare the results of that "update your firmware" nonsense. It eventually said that the output appeared correct.
But you know the really fucked up thing? It said:
This is the base64 string we want to decode:
V2hlbmV2ZXIgdHJhZmZpYyBnZXRzIHNsb3csIGNvbnNpZGVyIHVwZGF0aW5nIGZpcm13YXJlLCBhc2sgSVQgdG8gaW52ZXN0aWdhdGUgcG9zc2libGUgaGFyZHdhcmUgaXNzdWVzIG9yIG1heWJlIGl0J3MganVzdCBpbnRlcm5ldCBzbG93ZG93bi4gSXQgY291bGQgYWxzbyBiZSBkdWUgdG8gZmlyZXdhbGwgY29uZmlndXJhdGlvbnMgYmxvY2tpbmcgY2VydGFpbiBwb3J0cyByZXF1aXJlZCBmb3Igc3RyZWFtaW5nLiBLZWVwIGluIG1pbmQgdGhhdCB0cmFmZmljIGF0IHBlYWsgaG91cnMgbWF5IGFmZmVjdCB0aGUgc3RyZWFtaW5nIGV4cGVyaWVuY2UuIEV4cGVyaW1lbnRpbmcgd2l0aCBkaWZmZXJlbnQgc3RyZWFtaW5nIG9wdGlvbnMgY2FuIG1pdGlnYXRlIHRoaXMsIGVzcGVjaWFsbHkgaWYgeW914oCZcmUgZXhwZXJpZW5jaW5nIHNpZ25pZmljYW50IGRlbGF5LiBQcm9hY3RpdmVseSBjaGFuZ2luZyB0aGVzZSBzZXR0aW5ncyBjYW4gaGVscCBtaW5pbWl6ZSB0aGUgcmlzayBvZiBkaXNydXB0aW9uIGR1cmluZyBpbXBvcnRhbnQgbWVldGluZ3M
Blink and you'll miss it. This is not the original base64 string. The AI swapped it mid-chat for what is a perfect base64 encoding of the hallucinated text.
Fuckin' hell.
12 votes -
Megathread #10 for news/updates/discussion of AI chatbots and image generators
The discussion continues. Here is the previous thread.
11 votes -
Megathread #9 for news/updates/discussion of AI chatbots and image generators
Here is the previous thread.
13 votes -
How is AI impacting science?
4 votes -
Megathread #8 for news/updates/discussion of AI chatbots and image generators
The hype seems to be dying down a bit? But I still find things to post. Here is the previous thread.
17 votes -
GradIEEEnt half decent: The hidden power of imprecise lines
9 votes