Pretty neat to see in motion. It's one of those things where individually, none of this is particularly new - it's just slapping an LLM + generative voice onto one of Boston Dynamics robots. But...
Pretty neat to see in motion. It's one of those things where individually, none of this is particularly new - it's just slapping an LLM + generative voice onto one of Boston Dynamics robots. But together, it's quite magical and surreal to see together, that somewhere there is a walking robot dog with a british accent that you can have a coherent conversation with. It reminds me of something out of one of the Bethedsa fallout games.
Your comment didn’t prepare me for that video. It’s exactly as you say: none of this is new. With how much time I have spent with GPT, none of the responses surprised me. But that video actually...
Your comment didn’t prepare me for that video. It’s exactly as you say: none of this is new. With how much time I have spent with GPT, none of the responses surprised me. But that video actually was surreal. I can see so many uses for this already. I am sure they have customers banging down their doors for this integration.
Verbal instruction of robots is going to be paramount in what they will be able to actually understand and do. The parcel offloading robot requires human interaction to set it in place and start...
Verbal instruction of robots is going to be paramount in what they will be able to actually understand and do.
The parcel offloading robot requires human interaction to set it in place and start the load. The next step will be telling it to go to bay 9 and unload truck 45 to the conveyor, rather than needing a remote control.
I'm excited for the future as long as the law of robotics is included. LLM will allow for old folks to have personal nurses that won't scare them as they can chat. Tour guides. Butlers and waiters. Information points. Librarians. The list just goes on and on.
I, robot is just around the corner and I may get to see it in my life time. Not the scary conscience taking over and killing off human bit, the first part where robots are helpful.
After reading the comments here, I was rather disappointed with the video. It was just ChatGPT with very simple prompts. No long, cohesive conversation with context was shown, or how large any...
After reading the comments here, I was rather disappointed with the video. It was just ChatGPT with very simple prompts. No long, cohesive conversation with context was shown, or how large any delays are.
The most exciting thing was the quality of the voices used.
I agree… but it’s remarkable how quickly we’ve arrived at “it’s just ChatGPT.” A robot interacting with people in this way was considered far-out sci-fi just a couple years ago. Now it seems like...
I agree… but it’s remarkable how quickly we’ve arrived at “it’s just ChatGPT.” A robot interacting with people in this way was considered far-out sci-fi just a couple years ago. Now it seems like we’re already getting bored with it. I think the radical breakthrough of this tech warrants more appreciation than it’s getting.
Also a couple things to note about this implementation…
The video starts with a human introducing himself and someone else to the robot. One of the people has an uncommon name that was processed correctly. The model was probably fine-tuned on specific names of Boston Dynamics employees but I was still impressed when the bot repeated back both of them without missing a beat.
The video is short on details but the robot appears to have visual processing too. Apparently it recognizes people and museum exhibits by sight. If that’s true it’s also a significant feat.
I’m interested to know more about the interface between the LLM and the robot’s physical controls. The video mentions that someone asked about its parents and it led them to the exhibit of an earlier-model robot. If that’s true I’m impressed at its ability to convert the LLM output to valid navigation instructions.
I suppose it’s how much you can suspend disbelief to enjoy the token optimisation as something with intent or understanding. At some point I guess we get to a place where it’s indistinguishable...
Now it seems like we’re already getting bored with it. I think the radical breakthrough of this tech warrants more appreciation than it’s getting.
I suppose it’s how much you can suspend disbelief to enjoy the token optimisation as something with intent or understanding.
At some point I guess we get to a place where it’s indistinguishable from real understanding but we’re pretty far from there yet at the edge cases (where it matters for novel situations).
But the current LLM tech cannot really understand or have intent of its own.
I wonder if some kind of recursive system that can generate its own prompts is next. I know LLM’s are being given the task of generating prompts for other LLM’s so like how sophisticated can you get it and how quickly can you translate environmental data (robot sensors) to prompts so that model can update and “think” in real time.
For example an obstacle course is done via image reconstruction and spacial reasoning, but could it be done in a tight feedback loop between sensor, llm model prompt, motor movement output.
Image a model trying to navigate a movement procedure, the prompts would be similar to our stream of consciousness and thoughts like “that ledge is high, better lift my leg up”
I agree that it’s fun to string together ChatGPT, text-to-speak, image-to-text etc. and see what kind of experiences you can build. You can add this to an R/C vehicle, some kids toy, your phone...
I agree that it’s fun to string together ChatGPT, text-to-speak, image-to-text etc. and see what kind of experiences you can build. You can add this to an R/C vehicle, some kids toy, your phone etc.
It was just a little underwhelming coming from Boston Dynamics.
That's extremely surreal to watch. I was kind of skeptical, but it's wild how much "character" emerged based on the prompts. I was initially skeptical this would be all that new, but seeing a...
That's extremely surreal to watch. I was kind of skeptical, but it's wild how much "character" emerged based on the prompts. I was initially skeptical this would be all that new, but seeing a robot walk around and respond verbally to context (like someone holding a camera) was simultaneously awe inspiring and terrifying.
Sorry BD, but they're already replacing tour guides. They just expect you to have a smartphone rather than integrate it into an expensive dog. We really are in a innovative drought techwise.
Sorry BD, but they're already replacing tour guides. They just expect you to have a smartphone rather than integrate it into an expensive dog.
Pretty neat to see in motion. It's one of those things where individually, none of this is particularly new - it's just slapping an LLM + generative voice onto one of Boston Dynamics robots. But together, it's quite magical and surreal to see together, that somewhere there is a walking robot dog with a british accent that you can have a coherent conversation with. It reminds me of something out of one of the Bethedsa fallout games.
Your comment didn’t prepare me for that video. It’s exactly as you say: none of this is new. With how much time I have spent with GPT, none of the responses surprised me. But that video actually was surreal. I can see so many uses for this already. I am sure they have customers banging down their doors for this integration.
The video really made me think of VASCO from Starfield. I'm really looking forward to seeing where this goes even in the next 5 years!
Verbal instruction of robots is going to be paramount in what they will be able to actually understand and do.
The parcel offloading robot requires human interaction to set it in place and start the load. The next step will be telling it to go to bay 9 and unload truck 45 to the conveyor, rather than needing a remote control.
I'm excited for the future as long as the law of robotics is included. LLM will allow for old folks to have personal nurses that won't scare them as they can chat. Tour guides. Butlers and waiters. Information points. Librarians. The list just goes on and on.
I, robot is just around the corner and I may get to see it in my life time. Not the scary conscience taking over and killing off human bit, the first part where robots are helpful.
After reading the comments here, I was rather disappointed with the video. It was just ChatGPT with very simple prompts. No long, cohesive conversation with context was shown, or how large any delays are.
The most exciting thing was the quality of the voices used.
I agree… but it’s remarkable how quickly we’ve arrived at “it’s just ChatGPT.” A robot interacting with people in this way was considered far-out sci-fi just a couple years ago. Now it seems like we’re already getting bored with it. I think the radical breakthrough of this tech warrants more appreciation than it’s getting.
Also a couple things to note about this implementation…
The video starts with a human introducing himself and someone else to the robot. One of the people has an uncommon name that was processed correctly. The model was probably fine-tuned on specific names of Boston Dynamics employees but I was still impressed when the bot repeated back both of them without missing a beat.
The video is short on details but the robot appears to have visual processing too. Apparently it recognizes people and museum exhibits by sight. If that’s true it’s also a significant feat.
I’m interested to know more about the interface between the LLM and the robot’s physical controls. The video mentions that someone asked about its parents and it led them to the exhibit of an earlier-model robot. If that’s true I’m impressed at its ability to convert the LLM output to valid navigation instructions.
I suppose it’s how much you can suspend disbelief to enjoy the token optimisation as something with intent or understanding.
At some point I guess we get to a place where it’s indistinguishable from real understanding but we’re pretty far from there yet at the edge cases (where it matters for novel situations).
But the current LLM tech cannot really understand or have intent of its own.
I wonder if some kind of recursive system that can generate its own prompts is next. I know LLM’s are being given the task of generating prompts for other LLM’s so like how sophisticated can you get it and how quickly can you translate environmental data (robot sensors) to prompts so that model can update and “think” in real time.
For example an obstacle course is done via image reconstruction and spacial reasoning, but could it be done in a tight feedback loop between sensor, llm model prompt, motor movement output.
Image a model trying to navigate a movement procedure, the prompts would be similar to our stream of consciousness and thoughts like “that ledge is high, better lift my leg up”
I agree that it’s fun to string together ChatGPT, text-to-speak, image-to-text etc. and see what kind of experiences you can build. You can add this to an R/C vehicle, some kids toy, your phone etc.
It was just a little underwhelming coming from Boston Dynamics.
That's extremely surreal to watch. I was kind of skeptical, but it's wild how much "character" emerged based on the prompts. I was initially skeptical this would be all that new, but seeing a robot walk around and respond verbally to context (like someone holding a camera) was simultaneously awe inspiring and terrifying.
It reminds me a lot of Cogsworth from Fallout 4
Sorry BD, but they're already replacing tour guides. They just expect you to have a smartphone rather than integrate it into an expensive dog.
We really are in a innovative drought techwise.