My personal AI assistant project
Let me start off by saying that I'm exhausted by AI hype. Being interested in LLM agent technology (AI agent hereafter for brevity) means skimming over a lot of hype for one or two useful, semi reality based, bits of information. Maybe the part that I find the most frustrating is how effective the hype is. I don't know if there's ever been a hype cycle like this. Probably a big part of the reason for that is the internet has already proven, within living memory for most people, that technological revolutions really can change everything. Or mess everything up. Either way they generate a lot of economic activity.
So this post is not that. I'm not going to tell you about how AI agents are the second coming for Christ. I'm not selling anything.
Fairly early into learning about AI agents I wanted a way to connect to the agent remotely without hosting it somewhere or exposing ports to the internet. I settled on tailscale and a remote terminal and moved on, I rarely used it. Somehow the tiny friction of "Turn on tailscale, open terminal app, connect, run agent" was enough to make it not feel worth it.
I know I'm far from the only person who had the same "I want it remote" thought, the best evidence: OpenClaw. It's just one of those things that everyone naturally converges on.
If you're not familiar with OpenClaw, the TLDR is: Former founder with more money than he'll ever need vibecodes a bridge between instant messenger apps and LLM APIs. Nothing about it is technically challenging or requires solving any particularly hard problems. It almost immediately becomes the fastest growing GitHub repo of all time and is currently at number 14 for number of stars. It blew up the (tech) internet like very few things ever have. Within months he was hired by Open AI.
OpenClaw now does more than just connect messaging and agents, but I believe that one piece is the killer feature. My tailscale terminal solution, combined with a scheduled task or a cron job and some context files could already do all of the things that OpenClaw can do, and countless people had already implemented similar solutions. But I think it was the tiny bit of friction OpenClaw removed that was responsible for a lot its popularity.
I thought that was interesting but I have no interest in the security nightmare that is OpenClaw, or the "sentience" vibe for that matter, so I built my own tool.
Essentially it's just a light secondary harness combined with a bridge between Signal and Claude Code. It does some other things too, things I wished existing harnesses did, some memory and guidelines, automated prompts and reminders to wake the agent up and have it do stuff, some context to give the agent some level of persistence, make it less LLMy, less annoying. None of that is particularly interesting though.
Once I got it working (MVP took less than a day) and started playing with it, the OpenClaw phenomenon made a lot more sense. Somehow having the agent in a chat interface, with almost zero friction (just open the chat and send something) was cooler than it had any reason to be.
I can't explain it any better than that at the moment. Not only was it kinda fun, it lent itself to a whole range of "what ifs". What if it could do X? What if I wrote a tool that gave it Y capability? I've been experiencing that for some time, but somehow agent in your pocket has a different feeling.
Here's an example of a "what if". What if it could do our grocery shopping? I definitely want that. I already had a custom browser tool that I built for agent coding assistance so I was most of the way there. It was just a matter of teaching the agent to login and navigate a website, something they're already trained to do. Some hand holding, a few helper scripts, and an evening's worth of hours later and I had it working. The agent can respond to a shopping request by building a shopping list based on our most recent orders, presenting it to us for approval/edits in a Signal group chat, doing searches for any additional product requests and adding the finalized order to the cart. It could also checkout the order and schedule the delivery time but I'm doing the last 2 clicks manually for the time being. It's an idiot savant, it seems like a bad idea to give it access to my credit card. Maybe eventually.
The fact that I can handle shopping with a couple of signal messages feels effortless in a way that handling shopping by connecting to my PC terminal remotely via tailscale terminal wouldn't have. Especially when I can include people in the loop who have no interest in tailscaling anywhere. Everyone can use messaging apps.
I imagine before long solutions like this will be built in, either in the grocery websites and apps, or into the frontier harnesses themselves. There will probably be agents everywhere, for better or worse. Probably I'll wish that the agents would all fuck off. In the meantime it's exciting how easy it is to get these tools to do useful things.
Honestly, this is the kind of thing I've been looking for. A smarter (smartish) personal assistant with some persistent memory seems like a great use case for LLMs.
Hooking it up to a messaging interface is an interesting idea, since it would open up the collaborative aspect (anyone in the chat can request an addition, and see others' requests).
The extra piece I want is a "live mode" (I have not used it, But I think gemini pro is this) that I can activate through android auto or my phone for conversational interface. Use cases:
Archiving the live mode as that history and being able to go back and forth between the live and chat modes for the same conversations seems like a great way to manage the interface.
I just need a week (or seven) where the world is on pause. Also, while I'm asking, a warm bed, a kind word, and unlimited power.
If you ever manage to find the time, I'll be curious to hear what you come up with.
Alternatively the world of "claws" (OpenClaw alternatives) is growing fast, some of them even think about security. Maybe one of those would do what you want.
I very much like the theory of being able to have random people cobble together personal interoperability tools which would allow them to abstract from the underlying service providers.
Especially if it can be done in a way that means said providers will have to think twice before having to deal with PR backlash against making this hard for people.
Yeah the group chat collaborative bit ended up being another part that was cooler than I expected.
I really like your abstraction angle, it implies a less dystopian future than I've been imagining.
I gotta say, this post made me want to start running openclaw.
My apologies.
As a mentioned in a different reply, there are a lot of open source OpenClaw alternatives now that look more like security bad dreams than security nightmares. One upside of OpenClaw though, now that Open AI is sponsoring it there's a fair chance you'll be able to keep using a subscription with it, as opposed to paying API prices. The other big model companies have locked that down in the last couple weeks.
If you have the time though, the exercise of rolling your own gives you great insights into model behavior.
Interesting! Two quick thoughts from it:
Out of curiosity, in your grocery shopping example, how do you hand off the session from your system to you? Is it running in a VM or just on your home PC where there is a browser it’s using?
Purely out of curiosity, have you checked out exe.dev? It does not scratch the openclaw itch exactly, but to me it has been incredibly cool for prototyping. It’s also made by the founder of tailscale, and it feels like a really nice confluence of technologies.
Maybe you're asking about the shopping cart session? If so I just login from my phone.
However, I've read enough posts like yours about exe.dev to realize that it's much more useful for some people than I imagined it would be when it launched. Last I heard you could use Opus for free in it, but I'm guessing that with the recent Anthropic OAUTH developments that's no longer the case?
Ha! You inspired me to do the thing you did not do.
I went and installed OpenClaw.
It's sitting on a dedicated box, that only has telegram and OpenClaw running in a docker.
No personal data.
You are absolutely right about the frictionless access being a game changer. Next up I am going to give it some data to crunch.
After using it for a day, I can't but help feel that Apple is asleep at the wheel on this one.
Any plans to teach it to do anything specific? Are you using the default context files?
It's mostly teaching me. I plan to give it some data. I am curious how well it navigates structured data.
OK, I have created a couple of skills.
One is to query a local database with 25 years of financial data.
Another is to go download the latest financial data from SEC.
Currently the highly structured data in the database produces much more accurate results.
That makes sense, especially if you mean 10k filings. That's about as unstructured as you can get. Even their structured APIs are challenging with mix and match XBRL concepts.
That said, agents can be prodded into being useful in that context.