skybrian's recent activity
-
Comment on What programming/technical projects have you been working on? in ~comp
-
Comment on What programming/technical projects have you been working on? in ~comp
skybrian (edited )LinkI am sorry but I've somehow been turned into a superfan of exe.dev, Shelley (their coding agent), and Claude Opus 4.5. I'm going to gush a bit. After dabbling with running Claude Code in a...I am sorry but I've somehow been turned into a superfan of exe.dev, Shelley (their coding agent), and Claude Opus 4.5. I'm going to gush a bit.
After dabbling with running Claude Code in a devcontainer on my laptop, I decided I'd rather have my coding agent in the cloud. Here's a blog post about my setup.
Coding with AI feels absurdly productive, but some of that could be an illusion because the denominator (time spent) is large. This is more fun than a lot of video games I've played; there is a lot of "just one more turn" going on and the hours go by quickly. I browse the Internet after giving Shelley something to do, but I'm still doing it less, which seems like a good thing.
The first VM I created using exe.dev now has an uptime of 16 days. Time flies! It's just a dumb little link-sharing website, like a personal Hacker News. I asked Shelley to generate it because it was the first thing I thought of, just to try it out; I didn't even plan to use it. After that I mostly ignored for a while, but I had some links I wanted to keep handy and not post on Tildes (yet), so I saved them there. Then I wanted to close a few browser tabs with long articles that I hadn't finished reading yet, so I asked Shelley to create an unread section to save those. (Private because I don't want to recommend anything unread.) Today I thought maybe my saved links should have tags. So I came up with a short list of tags and asked Shelley to classify the links (all 15 of them) and add the UI for filtering and editing them. It just goes on and on; I have a bunch of features that I could decide to add and it would be easy.
If you're worried about code quality, you can work on that too. It's up to you. In my other VM I'm working on a Chrome extension, and it has 256 tests currently (I asked Shelley to count). Some of them are property tests written using repeat-test, the library I stopped working on about a year ago. I don't know what the code coverage is but at some point I'll ask Shelley to implement a command to compute it and we'll work together on driving it up. One of these days I should ask Shelley to write some unit tests for skybrian-links as well; currently it has none.
There are two developers I know of at exe.dev (maybe more, but I haven't met them). They seemed to like my blog post (since I said nice things about exe.dev) and followed me back. I posted feedback in the Discord and they're quite responsive. They open sourced Shelley a couple days ago and I might fix a bug or two, since it will be a lot easier with Shelley's help. As relationships with Internet service providers go, it's pretty sweet.
Today I saw an error at the end of one of the conversations I had with Shelley. Reloading the page didn't fix it, so it was frozen. I started writing a bug, but then I thought, "Shelley can do this." So I started another conversation, posted in the error message and asked it to write a bug report. It did an investigation, which it could do easily enough because Shelley's conversations are just stored in a local Sqlite database in the same VM it has full access to. It wrote a great bug report, figured out the SQL command to run to fix it locally, then offered to fix the stuck conversation. I said 'yes' so it did.
I assume that if I bothered to clone Shelley's repo, it could fix the bug too, and I could run the fixed version, and I could ask Shelley for any other features I want. This is a practical form of the free software ideal - user enpowerment like the Free Software Foundation wanted everyone to have, but in practice is only limited to programmers, and most of the time we don't fix the software we use even if we could in principle, because it's very complicated to fix and who has the time?
Claude Code is closed source. They're very popular, but they don't have this.
-
Comment on What programming/technical projects have you been working on? in ~comp
skybrian Link ParentI decided that I like a web UI better and I don't need so many agent features, so I abandoned Claude and went back to using exe.dev. I blogged about my setup here. Still a big fan of exe.dev and...I decided that I like a web UI better and I don't need so many agent features, so I abandoned Claude and went back to using exe.dev. I blogged about my setup here.
Still a big fan of exe.dev and Shelley and Claude Opus 4.5. This quasi-vibe-coding stuff is addictive like a good video game.
-
Comment on exe.dev, a service for creating Linux virtual machines and vibe-coding in them in ~comp
skybrian LinkThe exe.dev folks have open sourced their coding agent. It's a Go program and looks like it would run in any Linux VM. Note that it assumes you're running in a VM for security; the agent itself...The exe.dev folks have open sourced their coding agent. It's a Go program and looks like it would run in any Linux VM.
Note that it assumes you're running in a VM for security; the agent itself doesn't have any restrictions on what shell commands it can run. It also doesn't do authorization itself, so I assume you would need to put an HTTP proxy in front if you want to access it remotely. It doesn't have any install instructions, though I imagine a coding agent wouldn't find it hard to write some.
-
Comment on One regulation E, two very different regimes in ~finance
skybrian Link[...] [...] [...] [...] [...] [...] [...] [...] [...] [...]The U.S. is often maligned as being customer-hostile compared to other comparable nations, particularly those in Europe. One striking counterexample is that the government, by regulation, outsources to the financial industry an effective, virtually comprehensive, and extremely costly consumer protection apparatus covering a huge swath of the economy. It does this by strictly regulating the usage of what were once called “electronic” payment methods, which you now just call “payment” methods, in Regulation E.
Reg E is not uniformly loved in the financial industry. In particular, there has been a concerted effort by banks to renegotiate the terms of it with respect to Zelle in particular. This is principally because Zelle has been anomalously expensive, as Reg E embeds a strong, intentionally bank-funded anti-fraud regime, but Zelle does not monetize sufficiently to pay for it.
[...]
[...] Congress decided that unsophisticated Americans might be conned into using these newfangled electronic devices in ways that might cost them money, and this was unacceptable. Fraudulent use of an electronic fund transfer mechanism was considered an error as grave as the financial institution simply making up transactions. It had the same remedy: the financial institution corrects their bug at their cost.
[...]
Reg E provided for two caps on consumer liability for unauthorized electronic fund transfer: $50 in the case of timely notice to the financial institution, as sort of a deductible (Congress didn’t want to encourage moral hazard), and $500 for those customers who didn’t organize themselves sufficiently. Above those thresholds, it was the bank’s problem.
[...]
In this privately-administered court system, the bank is the prosecutor, the defendant, and the judge simultaneously, and the default judgment is “guilty.” It can exonerate itself only by, at its own expense and peril, producing a written record of the evidence examined. This procedural hurdle is designed to simplify review by the United States’ actual legal system, regulators, and consumer advocates.
[...]
Having done informal consumer advocacy for people with banking and debt issues for a few years, I cannot overstate the degree to which this prong of Reg E is a gift to consumer advocates. Many consumers are not impressively detail-oriented, and Reg E allows an advocate to conscript a financial institution’s Operations department to backfill the customer’s files about a transaction they do not have contemporaneous records of. In the case that the Operations department itself isn’t organized, great, at least from my perspective. Reg E says the bank just ate the loss. And indeed, several times over the years, the prototypical grandmother in Kansas received a letter from a bank vice president of consumer lending explaining that the bank was in receipt of her Reg E complaint, had credited her checking account, and considered the matter closed. It felt like a magic spell to me at the time.
[...]
There are on the order of 5 million criminal cases in the formal U.S. legal system every year. There are more than 100 million complaints to banks, some of them alleging a simple disagreement (undercooked eggs) and very many alleging crime (fraud). It costs banks billions of dollars to adjudicate them.
The typical physical form of an adjudication is not a weeks-long trial with multiple highly-educated representatives debating in front of a more-senior finder of fact. It is a CSR clicking a button on their web app’s interface after 3 minutes of consideration, and then entire evidentiary record often fits in a tweet.
[...]
Zelle processes enormous volumes. It crowed recently that it did $600 billion in volume in the first half of 2025. Zelle is much larger than the upstarts like Venmo (about $250 billion in annual volume) and Cash App (about $300 billion in customer inflows annually). This is not nearly in the same league as card payments (~$10 trillion annually) or ACH transfers (almost $100 trillion annually), but it is quite considerable.
All of it is essentially free to the transacting customers, unlike credit cards, which are extremely well-monetized. And there is the rub.
[...]
So, in sum and in scaled practice at call centers, the bank wants to quickly get customers to admit their fingers were on their phone when defrauded. If so, no reimbursement.
This rationale is new and is against our standard practice, for decades. If you are defrauded via a skimming device attached to an ATM, the bank is absolutely liable, and will almost always come to the correct conclusion immediately. It would be absurdly cynical to say that you intended to transact with the skimming device and demonstrated your assent by physically dipping your card past it.
[...]
Having for the moment renegotiated their Reg E obligations by asserting they don’t exist, and mostly getting away with it, some banks might attempt to feel their oats a bit and assert that customers bear fraud risks more generally.
For example, in my hometown of Chicago, there has been a recent spate of tap-to-pay donation fraud. The fraudster gets a processing account, in their own name or that of a confederate/dupe, to collect donations for a local charitable cause. (This is not in itself improper; the financial industry understands that the parent in charge of a church bake sale will not necessarily be able to show paperwork to that effect before the cookies go stale.) Bad actors purporting to be informal charities accost Chicagoans on the street and ask for a donation via tap-to-pay, but the actual charged donation was absurdly larger than what the donor expected to donate; $4,000 versus $10, for example. The bad actor then exits the scene quickly.
[...]
But Reg E doesn’t care about the safety of city streets, in Chicago or anywhere else. It assumes that payment instruments will continue to be used in an imperfect world. This case has a very clear designed outcome: customer calls bank, bank credits customer $4,000 because the customer was defrauded and therefore the “charity” lacked actual authority for the charge, bank pulls $4,000 from credit card processor, credit card processor attempts to pull $4,000 from the “charity”, card processor fails in doing so, card processor chalks it up to tuition to improve its fraud models in the future.
Except at least some banks, per the Chicago Tribune’s reporting, have adopted specious rationales to deny these claims. Some victims surrender physical control of their device, and banks argue that that means they authorized the transaction. Some banks asserted the manufactured-out-of-their-hindquarters rationale that Reg E only triggers when there is a physical receipt. (This inverts the Act’s responsibility graph, where banks were required to provide physical hardcopy receipts to avoid an accountability sink swallowing customer funds.)
[...]
With a limited number of carveouts (e.g. wire transfers), Reg E is intentionally drafted to be future-proof against changes in how Americans transact. This is why, when banks argue that some new payments rail is exempt because it is “different,” the correct legal response is usually some variation of: doesn’t matter—that’s Reg E.
-
One regulation E, two very different regimes
14 votes -
Comment on Agentic AI can change campaign operations in ~society
skybrian LinkFrom the blog post: [...] [...] [...] [...]From the blog post:
Here’s what political people know but rarely say out loud: campaigns are giant, deadline-driven startups of knowledge work.
Not “knowledge work” in the TED Talk sense. Knowledge work in the gritty sense:
[...]
A frightening amount of campaign labor is the same pattern repeated: take a messy pile of inputs, turn it into something legible, then turn that into action.
That is exactly the pattern these agentic tools are getting good at. And campaigns have endless messy piles of inputs.
[...]
For Democratic political staff, this is an acute risk. Our coalition depends on down-ballot races, state parties, and grassroots organizations that have never had the technical resources of presidential campaigns. If agentic AI remains “for engineers only,” the productivity gap between well-resourced and under-resourced campaigns will widen dramatically, and it’ll widen fastest in exactly the races where we can least afford it.
[...]
There’s an obvious failure mode here: everyone quietly experimenting, pasting sensitive data into whatever tool is easiest, building brittle automations with no discernible schema, accidentally recreating the worst parts of shadow IT but with higher stakes.
[...]
The goal is not to slow people down. The goal is to prevent the inevitable “we moved fast and broke trust” moment that causes a backlash and sets adoption back a cycle. Campaigns can’t hold institutional memory about what went wrong because they dissolve. The enduring organizations have to own governance, because they’re the ones who’ll still be around to learn from the mistakes.
-
Agentic AI can change campaign operations
5 votes -
Comment on Fascist, thus inefficient in ~movies
skybrian LinkA bit of Star Wars fan fiction. It's readable without a Tumblr account, but just barely.A bit of Star Wars fan fiction. It's readable without a Tumblr account, but just barely.
-
Fascist, thus inefficient
24 votes -
Comment on Reading Lolita in the barracks in ~life
skybrian LinkFrom the article: [...] [...] [...] [...] [...]From the article:
What every South Korean man agrees on is that serving in the military is a dreadful experience. The chief agony reported by draftees isn’t the plutonium-happy neighbor to the north but the hazing and abuse — physical, mental, and sexual — that have long defined military life.
[...]
The hierarchy was absolute, based on ranks that were determined strictly by time served. [...]
[...]Even within the same rank, your month of enlistment mattered. An August recruit (me) was forever junior to a July recruit of the same year; it was common to call someone by their enlistment month. I was, for a time, simply “August.”
It is easy to mistake the military for an unimaginative institution, but a glance at South Korean hazing culture reveals that creativity is alive and well in these unlikely places. By the time I enlisted, the most brutal forms of physical hazing [...] were officially banned. Even so, there were creative offerings that could teach American frat bros a thing or two.
The more innocuous ones involved forcing new recruits to dance or sing on command. On the gastronomical front, a marine once forced a private to eat an entire box of chocolate pies (1,980 kcal). There was simulated solitary confinement, where a person could be denied all communication with the outside world — no calls, no visits, no leave.
For the low-ranking, many forms of “self-improvement” were forbidden. Going to the gym was out of the question. Lying down was considered too comfortable; one had to sit with a perfectly straight back. The privilege of changing the TV channel or adjusting the fan was reserved for seniors.
The simplest chores were inflated into laborious rituals. Every night, the most junior private from each platoon would line up with tissues. A senior would then make them squat and, with the tissue, pick up every single pubic hair from the communal bathroom floor.
[...]
Strangely, the whole ordeal was aggravating but not exactly humiliating. The philosopher Sidney Morgenbesser captured this psychology aptly. After being beaten by police during a campus protest, he was asked if he had been treated unfairly or unjustly. He responded: “Unjust, but not unfair. It was unjust because they hit me over the head, but not unfair because they hit everyone else over the head.”
[...]
There was also a general sentiment that a private had not yet “earned” the right to study, meaning that during the first year of service, even using an available carrel would draw unwanted attention. This wasn’t so much anti-intellectualism as a form of deprivation grounded in a clear understanding of education's value — they knew exactly what was being withheld.
My solution: night watch duty. [...] A universally hated task, as you can imagine, since your sleep was interrupted by shifts that came around every two or three days. But it also meant an hour of solitude, an hour to read unwatched.
One night, loath to put my book down halfway through, instead of waking the next person, I just kept on reading. I figured I would get chewed out for the screw-up later. But the men whose shifts I covered were only too glad not to be woken up. Eureka.
I started covering others’ shifts — often three hours from 1 to 4 a.m., sometimes two from 4 to 6 a.m., when loudspeakers blared the start of the day. A fair trade. More sleep for them, more reading for me. My late nights were made possible only by military-grade instant coffee and the kick I got from my own insufferable self-romanticization as a reader by night, soldier by day.
[...]
For all its byzantine rituals, the governance itself was simple: two platoon leaders formed a duopoly, issuing rules and diktats. Every few months, new platoon leaders were selected according to some mysterious criteria set by the officers. (One clear requirement was that they had to be at least corporals.) The cliché that power corrupts seemed true. It was as if the green shoulder patch identifying the platoon leader, once sewn onto his uniform, became a kind of radioactive implant that initiated a decay of character.
[...]
As for why we never reported anything to the officers, as you can now see, it was because they were apathetic to our welfare. Besides, the system exacts a vow of omertà from its members. To snitch to the officers was to commit the ultimate taboo, guaranteeing retribution beyond imagination — what exactly that was, we didn’t know, because I never saw anyone try.
I had trouble understanding the logic that sustained the dramas of the barracks — how violence seemed to obey a transitive property, with each man inflicting what he had once endured. A cluster of privates who had enlisted in close succession — e.g., July, August, September, October — would form a natural cohort. And time and again, once a new platoon leader was selected, the man we had hoped would be our Jesus Christ turned into a Grand Inquisitor. We underlings nursed the same fantasy: when one of us became a platoon leader, we would finally bring about reform.
-
Reading Lolita in the barracks
20 votes -
Comment on US withdraws from sixty-six international organisations in ~society
skybrian (edited )Link ParentIt's not much comfort, but here's something that might result in some restraint: there are stricter US laws governing what the military can do in the US than outside the US. Example: the drone...It's not much comfort, but here's something that might result in some restraint: there are stricter US laws governing what the military can do in the US than outside the US. Example: the drone strikes during the Obama administration would be illegal domestically.
The Trump administration might ignore laws anyway (for example, killing people who have surrendered during the attacks against boats near Venezuela), but legally, it does unfortunately have more freedom of action outside the US.
Capturing Maduro sure seems like it ought to be illegal, but I couldn't really say what US laws it breaks. It would be good to see what lawyers have written about it.
-
Comment on US withdraws from sixty-six international organisations in ~society
skybrian Link ParentIt seems like the Civil War was about as extreme as a stress test could get?It seems like the Civil War was about as extreme as a stress test could get?
-
Comment on Feeling weird about my career with respect to AI in ~life
skybrian (edited )LinkI want to write a blog post about this, but briefly, I think video games are a useful metaphor for speculating about the future of programming. Traditional programming is like a first-person...I want to write a blog post about this, but briefly, I think video games are a useful metaphor for speculating about the future of programming.
Traditional programming is like a first-person shooter. Sure, you might have nice tools, like maybe an auto-aimer or a really big gun, but you’re fundamentally driving one character. Or if you’re multi-tasking then it’s like a turn-based RPG where you directly control all the members of your party.
There are also games like Lemmings or the Sims or RimWorld where you have somewhat indirect control over multiple characters, by giving them tasks or controlling their environment. They might interfere with each other and won’t do quite what you want and that’s part of the challenge. Fortunately you can restart the level if you need to. This is what it’s like to write software using coding agents. I am writing software with one coding agent and I can report that it’s fun and educational. It probably helps that it’s a personal side project. I’m still wary about running more than one at a time; it seems like running around spinning plates?
There are also RTS games where you control a small army in real time and frantically scroll around giving them orders. People are trying to write orchestrators to make software development like an RTS, but this is currently a crazy science project. Maybe it will be practical in a year or two. It seems stressful to me; I prefer turn-based games.
Zooming out a bit more, there are games like Sim City or strategy games where you’re managing large populations of NPC’s (which may or may not be explicitly modeled). There’s no equivalent to this yet, but maybe it will happen if they can get the coding agents to coordinate well enough?
Writing code in assembly is still needed in certain niches. I wrote a bit of assembly as a kid and took a course in college where we wrote assembly. Even back then, it was taught as something you should understand rather than something you’re likely to do much of at work.
Similarly, I expect that hobbyist programmers will be able to play the programming game at whichever zoom level they like and people will do some software development at each level as part of their education. Commercially, I expect that there will be lots of demand for people who are comfortable managing coding agents and cleaning up their messes. It’s a different game, but it’s still software development. You are giving the coding agents tasks by giving orders (essentially, writing bug reports) and attempting to control how they do it by editing AGENTS.md and other documents that the agents refer to.
-
Comment on Dell's Consumer Electronics Show 2026 chat was the most pleasingly un-AI briefing I've had in maybe five years in ~tech
skybrian Link ParentThis reminds me of how Google used machine learning in its many services before LLM’s. Sure, it was there and powered many features (including search), but the user didn’t need to know or care how...This reminds me of how Google used machine learning in its many services before LLM’s. Sure, it was there and powered many features (including search), but the user didn’t need to know or care how the algorithms work. Similarly, these NPU chips could be in the background.
For developers to care about the NPU’s and use them in their apps, though, they need to be exposed in an API, hopefully in a standard, portable way. As we’ve seen with GPU’s, that can take some doing.
For web developers, I see that Chrome has an experimental Prompt API that’s hidden behind a flag. It looks like there’s been little progress coming up with a web standard.
-
nullschool earth: a visualization of global weather conditions
19 votes -
Comment on What are some stories of progressivism gone wrong in implementation? in ~society
skybrian Link ParentIt seems like privilege and discrimination can be more complex than is often acknowledged. I don’t know enough of the history to make confident claims, but there’s a story about police forces in...It seems like privilege and discrimination can be more complex than is often acknowledged. I don’t know enough of the history to make confident claims, but there’s a story about police forces in US cities having a lot of Irish back when the Irish were also discriminated against. Presumably being Irish was helpful for getting certain jobs, but not generally.
Similarly, in certain businesses or certain neighborhoods, perhaps being Asian actually is an advantage?
The presumptions we make at a nationwide level about others having unearned advantages or disadvantages are often a simplified version of the actual situation any given person faces.
-
Comment on Mystery trader garners $400,000-plus windfall on Nicolas Maduro's capture in ~society
skybrian Link ParentThis is all justified as a way to get more useful public information, but I think the value of that information is pretty sparse and low-value because it’s opaque. It’s a Ouija board. You get out...This is all justified as a way to get more useful public information, but I think the value of that information is pretty sparse and low-value because it’s opaque. It’s a Ouija board. You get out changes to a number on a graph and nobody really knows why. Confident whales can lose and there’s no way to check their work.
Perhaps the conversation around a prediction market could have some value, like someone betting and then posting evidence to try to convince others.
-
Comment on Grok AI generates images of ‘minors in minimal clothing’ in ~tech
skybrian LinkX blames users for Grok-generated CSAM; no fixes announced [...] [...]X blames users for Grok-generated CSAM; no fixes announced
On Saturday, X Safety finally posted an official response after nearly a week of backlash over Grok outputs that sexualized real people without consent. Offering no apology for Grok’s functionality, X Safety blamed users for prompting Grok to produce CSAM while reminding them that such prompts can trigger account suspensions and possible legal consequences.
“We take action against illegal content on X, including Child Sexual Abuse Material (CSAM), by removing it, permanently suspending accounts, and working with local governments and law enforcement as necessary,” X Safety said. “Anyone using or prompting Grok to make illegal content will suffer the same consequences as if they upload illegal content.”
[...]
X did not immediately respond to Ars’ request to clarify if any updates were made to Grok following the CSAM controversy. Many media outlets weirdly took Grok at its word when the chatbot responded to prompts demanding an apology by claiming that X would be improving its safeguards. But X Safety’s response now seems to contradict the chatbot, which, as Ars noted last week, should never be considered reliable as a spokesperson.
[...]
While some users are focused on how X can hold users responsible for Grok’s outputs when X is the one training the model, others are questioning how exactly X plans to moderate illegal content that Grok seems capable of generating.
Yep, it's a wrapper. UI is bare-bones but it has what I need. There's a pulldown where you choose which LLM you want to use. I haven't changed it away from the default, Claude Opus 4.5, and it's using their API key, installed when you create the VM.
My understanding is that when they have billing working, they will be including some AI usage with the subscription. For now, it's free and apparently unlimited. I shudder to think of what their bills must be, but supposedly they're keeping an eye on it. It can't last.
I assume you'll be able to use your own API key.