33 votes

My personal AI assistant project

Posted February 23 by post_below

Tags: ask.discussion, artificial intelligence, language models.large, assistants.personal, openclaw, vibecoding, shopping

Let me start off by saying that I'm exhausted by AI hype. Being interested in LLM agent technology (AI agent hereafter for brevity) means skimming over a lot of hype for one or two useful, semi reality based, bits of information. Maybe the part that I find the most frustrating is how effective the hype is. I don't know if there's ever been a hype cycle like this. Probably a big part of the reason for that is the internet has already proven, within living memory for most people, that technological revolutions really can change everything. Or mess everything up. Either way they generate a lot of economic activity.

So this post is not that. I'm not going to tell you about how AI agents are the second coming for Christ. I'm not selling anything.

Fairly early into learning about AI agents I wanted a way to connect to the agent remotely without hosting it somewhere or exposing ports to the internet. I settled on tailscale and a remote terminal and moved on, I rarely used it. Somehow the tiny friction of "Turn on tailscale, open terminal app, connect, run agent" was enough to make it not feel worth it.

I know I'm far from the only person who had the same "I want it remote" thought, the best evidence: OpenClaw. It's just one of those things that everyone naturally converges on.

If you're not familiar with OpenClaw, the TLDR is: Former founder with more money than he'll ever need vibecodes a bridge between instant messenger apps and LLM APIs. Nothing about it is technically challenging or requires solving any particularly hard problems. It almost immediately becomes the fastest growing GitHub repo of all time and is currently at number 14 for number of stars. It blew up the (tech) internet like very few things ever have. Within months he was hired by Open AI.

OpenClaw now does more than just connect messaging and agents, but I believe that one piece is the killer feature. My tailscale terminal solution, combined with a scheduled task or a cron job and some context files could already do all of the things that OpenClaw can do, and countless people had already implemented similar solutions. But I think it was the tiny bit of friction OpenClaw removed that was responsible for a lot its popularity.

I thought that was interesting but I have no interest in the security nightmare that is OpenClaw, or the "sentience" vibe for that matter, so I built my own tool.

Essentially it's just a light secondary harness combined with a bridge between Signal and Claude Code. It does some other things too, things I wished existing harnesses did, some memory and guidelines, automated prompts and reminders to wake the agent up and have it do stuff, some context to give the agent some level of persistence, make it less LLMy, less annoying. None of that is particularly interesting though.

Once I got it working (MVP took less than a day) and started playing with it, the OpenClaw phenomenon made a lot more sense. Somehow having the agent in a chat interface, with almost zero friction (just open the chat and send something) was cooler than it had any reason to be.

I can't explain it any better than that at the moment. Not only was it kinda fun, it lent itself to a whole range of "what ifs". What if it could do X? What if I wrote a tool that gave it Y capability? I've been experiencing that for some time, but somehow agent in your pocket has a different feeling.

Here's an example of a "what if". What if it could do our grocery shopping? I definitely want that. I already had a custom browser tool that I built for agent coding assistance so I was most of the way there. It was just a matter of teaching the agent to login and navigate a website, something they're already trained to do. Some hand holding, a few helper scripts, and an evening's worth of hours later and I had it working. The agent can respond to a shopping request by building a shopping list based on our most recent orders, presenting it to us for approval/edits in a Signal group chat, doing searches for any additional product requests and adding the finalized order to the cart. It could also checkout the order and schedule the delivery time but I'm doing the last 2 clicks manually for the time being. It's an idiot savant, it seems like a bad idea to give it access to my credit card. Maybe eventually.

The fact that I can handle shopping with a couple of signal messages feels effortless in a way that handling shopping by connecting to my PC terminal remotely via tailscale terminal wouldn't have. Especially when I can include people in the loop who have no interest in tailscaling anywhere. Everyone can use messaging apps.

I imagine before long solutions like this will be built in, either in the grocery websites and apps, or into the frontier harnesses themselves. There will probably be agents everywhere, for better or worse. Probably I'll wish that the agents would all fuck off. In the meantime it's exciting how easy it is to get these tools to do useful things.

13 comments

[2]
first-must-burn
February 23
Link
Honestly, this is the kind of thing I've been looking for. A smarter (smartish) personal assistant with some persistent memory seems like a great use case for LLMs. Hooking it up to a messaging...
Honestly, this is the kind of thing I've been looking for. A smarter (smartish) personal assistant with some persistent memory seems like a great use case for LLMs.

Hooking it up to a messaging interface is an interesting idea, since it would open up the collaborative aspect (anyone in the chat can request an addition, and see others' requests).

The extra piece I want is a "live mode" (I have not used it, But I think gemini pro is this) that I can activate through android auto or my phone for conversational interface. Use cases:
- Advanced calendar management - looking at all our calendars for context so I can say things like "add an appointment for me to take my daughter to X" and have it say, "you have basketball/chess/whatever" scheduled for time X, which only leaves you X time to get there/eat dinner/etc"
- General knowledge lookup with conversational followup
- List / notes management - being able to enumerate existing notes, find the relevant one, add or edit it, etc (basically a generalization of to the grocery list use case)
- Multichannel messaging - more advanced capability to write and revise messages and approve then to be sent to the appropriate text group, email, etc.
Archiving the live mode as that history and being able to go back and forth between the live and chat modes for the same conversations seems like a great way to manage the interface.

I just need a week (or seven) where the world is on pause. Also, while I'm asking, a warm bed, a kind word, and unlimited power.
7 votes
1. post_below (OP)
  February 23
  Link Parent
  If you ever manage to find the time, I'll be curious to hear what you come up with. Alternatively the world of "claws" (OpenClaw alternatives) is growing fast, some of them even think about...
  
  If you ever manage to find the time, I'll be curious to hear what you come up with.
  
  Alternatively the world of "claws" (OpenClaw alternatives) is growing fast, some of them even think about security. Maybe one of those would do what you want.
  
  4 votes
[2]
vord
February 23
Link
I very much like the theory of being able to have random people cobble together personal interoperability tools which would allow them to abstract from the underlying service providers. Especially...

I very much like the theory of being able to have random people cobble together personal interoperability tools which would allow them to abstract from the underlying service providers.

Especially if it can be done in a way that means said providers will have to think twice before having to deal with PR backlash against making this hard for people.

5 votes
1. post_below (OP)
  February 23
  Link Parent
  Yeah the group chat collaborative bit ended up being another part that was cooler than I expected. I really like your abstraction angle, it implies a less dystopian future than I've been imagining.
  
  Yeah the group chat collaborative bit ended up being another part that was cooler than I expected.
  
  I really like your abstraction angle, it implies a less dystopian future than I've been imagining.
  
  3 votes
[2]
unkz
February 23
Link
I gotta say, this post made me want to start running openclaw.

I gotta say, this post made me want to start running openclaw.

4 votes
1. post_below (OP)
  February 24
  Link Parent
  My apologies. As a mentioned in a different reply, there are a lot of open source OpenClaw alternatives now that look more like security bad dreams than security nightmares. One upside of OpenClaw...
  
  My apologies.
  
  As a mentioned in a different reply, there are a lot of open source OpenClaw alternatives now that look more like security bad dreams than security nightmares. One upside of OpenClaw though, now that Open AI is sponsoring it there's a fair chance you'll be able to keep using a subscription with it, as opposed to paying API prices. The other big model companies have locked that down in the last couple weeks.
  
  If you have the time though, the exercise of rolling your own gives you great insights into model behavior.
  
  1 vote
[2]
umbrae
February 24
Link
Interesting! Two quick thoughts from it: Out of curiosity, in your grocery shopping example, how do you hand off the session from your system to you? Is it running in a VM or just on your home PC...
Interesting! Two quick thoughts from it:
1. Out of curiosity, in your grocery shopping example, how do you hand off the session from your system to you? Is it running in a VM or just on your home PC where there is a browser it’s using?
2. Purely out of curiosity, have you checked out exe.dev? It does not scratch the openclaw itch exactly, but to me it has been incredibly cool for prototyping. It’s also made by the founder of tailscale, and it feels like a really nice confluence of technologies.
2 votes
1. post_below (OP)
  February 24
  Link Parent
  Do you mean how are messages passed between my system and Signal? A bridge runs on the PC side and passes messages back and forth. It also keeps the agent in line. The session is never handed off,...
  
  Do you mean how are messages passed between my system and Signal? A bridge runs on the PC side and passes messages back and forth. It also keeps the agent in line. The session is never handed off, at least not the way I'd define it. Everything except the remote signal app runs on my PC, sandboxed where appropriate.
  
  Maybe you're asking about the shopping cart session? If so I just login from my phone.
  
  I took a look at exe.dev when it launched, it didn't do anything I found useful. I can run all the VMs I want for free on my PC or on a colocated web server. But really I find that with my existing setup and scaffolding I rarely want a VM.
  
  However, I've read enough posts like yours about exe.dev to realize that it's much more useful for some people than I imagined it would be when it launched. Last I heard you could use Opus for free in it, but I'm guessing that with the recent Anthropic OAUTH developments that's no longer the case?
[5]
nic
February 28
Link
Ha! You inspired me to do the thing you did not do. I went and installed OpenClaw. It's sitting on a dedicated box, that only has telegram and OpenClaw running in a docker. No personal data. You...

Ha! You inspired me to do the thing you did not do.

I went and installed OpenClaw.

It's sitting on a dedicated box, that only has telegram and OpenClaw running in a docker.

No personal data.

You are absolutely right about the frictionless access being a game changer. Next up I am going to give it some data to crunch.

After using it for a day, I can't but help feel that Apple is asleep at the wheel on this one.

1 vote
1. [4]
  post_below (OP)
  February 28
  Link Parent
  Any plans to teach it to do anything specific? Are you using the default context files?
  
  Any plans to teach it to do anything specific? Are you using the default context files?
  
  1 vote
  1. nic
    February 28
    Link Parent
    It's mostly teaching me. I plan to give it some data. I am curious how well it navigates structured data.
    
    It's mostly teaching me. I plan to give it some data. I am curious how well it navigates structured data.
    
    2 votes
  2. [2]
    nic
    March 1
    Link Parent
    OK, I have created a couple of skills. One is to query a local database with 25 years of financial data. Another is to go download the latest financial data from SEC. Currently the highly...
    
    OK, I have created a couple of skills.
    
    One is to query a local database with 25 years of financial data.
    
    Another is to go download the latest financial data from SEC.
    
    Currently the highly structured data in the database produces much more accurate results.
    
    1 vote
    
    post_below (OP)
    March 1
    Link Parent
    That makes sense, especially if you mean 10k filings. That's about as unstructured as you can get. Even their structured APIs are challenging with mix and match XBRL concepts. That said, agents...
    
    That makes sense, especially if you mean 10k filings. That's about as unstructured as you can get. Even their structured APIs are challenging with mix and match XBRL concepts.
    
    That said, agents can be prodded into being useful in that context.
    
    1 vote