skybrian's recent activity

  1. Comment on Nearly half of the US data centers planned for 2026 are getting delayed or canceled because nobody stockpiled enough transformers and circuit breakers in ~tech

    skybrian
    Link Parent
    We're not going to get dramatic efficiency improvements for airplanes or cars because they're limited by the laws of physics. However, a Prius does have pretty good gas mileage and many electric...

    We're not going to get dramatic efficiency improvements for airplanes or cars because they're limited by the laws of physics. However, a Prius does have pretty good gas mileage and many electric cars have pretty good range nowadays.

    Meanwhile, there are no law of physics or mathematical proofs preventing some researcher from coming up with a new AI algorithm that's either much more capable or equally capable but much cheaper to run. For all way know there's some undiscovered algorithm like the Fast Fourier Transform that will dramatically improve things. More likely, it will be a slog with steady improvements coming over the years.

    5 votes
  2. Comment on Designing an agent reading test in ~comp

    skybrian
    Link
    From the article: [...] [...] [...] [...]

    From the article:

    Agents are unreliable self-reporters. Not because they're dishonest, but because LLMs are agreeable pattern-matchers. Show them a rubric and ask if they met it, and they'll find a way to say yes. Self-assessment requires the model to contradict its own prior output, which goes against the grain of how these systems work.

    [...]

    The observer changes the observation. Every time we found a way to get more accurate reporting, we discovered a new layer where accuracy was lost. Score inflation led to qualitative reporting led to reframing as diagnostic led to task-based design led to the discovery of summarization layers led to human-side comparison. Each fix revealed the next problem. At some point you have to accept that agent self-reporting has a fundamental accuracy ceiling and design around it rather than through it.

    Evaluation framing changes agent effort, not just agent reporting. The Hawthorne effect is different from score inflation or unreliable narration. Those are reporting problems; the agent claims something it didn't do. The Hawthorne effect is a behavioral problem: the agent actually does something it wouldn't normally do (retry harder, try fallback approaches, scan more carefully) because it recognizes the evaluation context. You can't fix this by improving the reporting mechanism. You fix it by making the evaluation context less visible.

    Usability testing principles apply directly. The framing "we're testing the documentation, not you" is borrowed from human usability testing, where it reduces performance anxiety and produces more natural behavior. For agents, it serves a similar purpose: an agent that thinks the documentation is being evaluated has less reason to try heroic workarounds than one that thinks its own capabilities are being measured.

    [...]

    Separation of concerns is the most reliable pattern. The final design works because each participant does what they're good at. The agent does what agents do: read web pages and answer questions. The scoring form does what static analysis does: compare strings against a known set. The human does what humans do: judge whether "pool_size = 10" in the agent's summary matches "pool_size = 10" in the reference. No single participant is asked to do something it's structurally bad at.

    [...]

    The Agent Reading Test has 20 points across 10 tasks: 16 from canary tokens and 4 from qualitative assessment of task responses. A perfect score is unlikely for any current agent. The tasks are calibrated so that each failure mode affects at least some agents in practice.

    [...]

    The most interesting results come not from the score itself but from the pattern of what's missing. An agent that scores 12 with all truncation canaries but no tabbed content tells a different story than one that scores 12 with tabs but no content past 40K. The scoring form's implications text tries to make these patterns readable for anyone running the benchmark, not just people who designed it.

    2 votes
  3. Comment on Nearly half of the US data centers planned for 2026 are getting delayed or canceled because nobody stockpiled enough transformers and circuit breakers in ~tech

    skybrian
    Link Parent
    We don't have a Moore's law for LLM's like we do for chips, but my guess is that something similar will happen: it's going to get more efficient and usage will grow at the same time.

    We don't have a Moore's law for LLM's like we do for chips, but my guess is that something similar will happen: it's going to get more efficient and usage will grow at the same time.

    5 votes
  4. Comment on Nearly half of the US data centers planned for 2026 are getting delayed or canceled because nobody stockpiled enough transformers and circuit breakers in ~tech

    skybrian
    Link
    Moving a bit slower doesn't seem like a bad thing since it will encourage efficiency improvements. Just since the start of the year there have been some interesting research papers about improving...

    Moving a bit slower doesn't seem like a bad thing since it will encourage efficiency improvements. Just since the start of the year there have been some interesting research papers about improving inference efficiency.

    15 votes
  5. Comment on Can a country get too rich? Norway shows the potential pitfalls of uncommon prosperity. in ~finance

    skybrian
    Link
    The article talks about inefficiency in terms of wasted money. The question is what are the real-world impacts of that? It might be increased environmental impact or wasting people's time on...

    The article talks about inefficiency in terms of wasted money. The question is what are the real-world impacts of that? It might be increased environmental impact or wasting people's time on "bullshit jobs" compared to a more efficient system.

    A more in-depth investigation would answer questions like that.

    8 votes
  6. Comment on AI Coding agents are the opposite of what I want in ~comp

    skybrian
    Link Parent
    Yes. The coding agent I use is more like a command-line interface, so that’s “traditional” in an old-school sense. It’s not “as you type.” One way to think of it is that I can imagine a command...

    Yes. The coding agent I use is more like a command-line interface, so that’s “traditional” in an old-school sense. It’s not “as you type.” One way to think of it is that I can imagine a command that does what I want, and then I can ask the AI to do that (in English). It will use the appropriate tools.

    But I can turn the tables by asking it if it has any questions or suggestions for improvements. I commonly ask it to review design docs.

    1 vote
  7. Comment on AI Coding agents are the opposite of what I want in ~comp

    skybrian
    Link
    Are you using a tool that limits what you can ask for? At a prompt, I can ask the coding agent to do whatever task I want.

    Are you using a tool that limits what you can ask for? At a prompt, I can ask the coding agent to do whatever task I want.

    5 votes
  8. Comment on Competence is lonely. Nobody talks about why. in ~health.mental

    skybrian
    Link Parent
    It seems like a workplace has to be pretty dysfunctional for things like promotions and raises to be entirely independent of effort? Sure, there are places like that, but it seems pretty cynical.

    It seems like a workplace has to be pretty dysfunctional for things like promotions and raises to be entirely independent of effort? Sure, there are places like that, but it seems pretty cynical.

    3 votes
  9. Comment on Colorado passes first law in the US to ban arrests based solely on colorimetric drug tests in ~society

    skybrian
    Link
    From the article:

    From the article:

    Colorado just enacted the nation’s first law banning arrests based solely on the results of colorimetric drug tests – a field test widely used by law enforcement across the country.

    The tests are popular because they’re cheap, portable and can screen for drugs in mere minutes. It’s just not feasible to send all suspected drug samples to state laboratories, which would be far more expensive and could take days or weeks to return results.

    But these inexpensive tests also lead to false positives at alarming rates, researchers from the University of Pennsylvania found.

    While the actual error rate nationwide is unknown, previous studies by manufacturers have put it around 4%. But the UPenn researchers believe the actual rate is much higher, from 15% to 38%. And a study by the New York City Department of Investigation showed test error rates from 79% to 91% in some correctional settings.

    10 votes
  10. Comment on Nation's largest urban battery is being built in Daly City, California in ~enviro

    skybrian
    Link Parent
    I don't think this sort of comparison can be made without doing detailed calculations. For example, there are limits on how much energy power lines can transmit, so time of day might matter? Some...

    I don't think this sort of comparison can be made without doing detailed calculations. For example, there are limits on how much energy power lines can transmit, so time of day might matter?

    Some power is generated locally (for example, residential solar), so it seems like having some local storage might be useful?

    Pumped storage is good where available but geographically, it's pretty limited.

    4 votes
  11. Comment on Evacuation of US troops from Mideast base sends community groups scrambling to help in ~society

    skybrian
    Link
    From the article: [...] [...] [...]

    From the article:

    Bahrain is the home of the Navy's 5th Fleet, making it a central hub for providing maritime security in the Middle East region, including protecting commercial shipping. The country is an island in the Persian Gulf that sits roughly 124 nautical miles away from the coast of Iran, which makes Bahrain well within range of Iranian drone and missile strikes.

    [...]

    On the opening day of the war, the base, known as Naval Support Activity (NSA) Bahrain, was struck multiple times. Posts on social media showed a ballistic missile and Iranian drones slamming into the base. Satellite imagery from the company Planet shows that at least seven buildings in and around the base were struck between Feb. 28 and March 6.

    In response to an NPR request, a Navy spokesman acknowledged that 1,500 sailors, their families and several hundred pets were relocated back to the U.S. from NSA Bahrain.

    Sailors have been arriving in Norfolk, Va., home to the world's biggest naval base, since at least the middle of March. Several groups that provide aid to military personnel say that the sailors arrived with very little. A call went out to community groups, asking for basic supplies like hygiene products.

    [...]

    "They literally told them, 'Get what you can get in the backpack. You've got to go,'" he said. "They came with no uniforms, nothing. The three we met first, they came with the clothes on their back, what they could fit in that backpack."

    [...]

    When troops move overseas, they don't keep a home in the United States. The military requires service members to designate a safe haven where they will be relocated to in an emergency. Some of the sailors have gone to stay with relatives, while others remain on bases in the United States. MacDill Air Force Base in Tampa, Fla., and Joint Base Charleston in South Carolina have also been hubs for returning flights.

    On April 1, the Navy released updated guidance for sailors and families who were evacuated. The service has worked out how people can be reimbursed for living in hotel rooms, including families who were temporarily relocated to Italy and Germany before being transported back to the United States.

    8 votes
  12. Comment on Surf Social (from the makers of Flipboard) in ~tech

    skybrian
    Link Parent
    That protocol does exist. It's an RSS or Atom feed. Looks like surf.social does import RSS feeds. I don't know if it will export them as well.

    That protocol does exist. It's an RSS or Atom feed.

    Looks like surf.social does import RSS feeds. I don't know if it will export them as well.

    4 votes
  13. Comment on Used electric vehicles are a bargain right now in ~transport

    skybrian
    Link
    From the article: [...] [...] [...] [...]

    From the article:

    Across the U.S., people like Shepard are finding that used EVs are more attractively priced than ever — and are snapping the cars up as a result. It’s a welcome development in what has otherwise been a tough year for an industry that’s key to combatting climate change.

    With the oil shock created by the war in Iran, used EVs are likely to become even more attractive to shoppers. Nationally, gas prices have surged to over $4 per gallon on average; in California, the country’s EV capital, they’re nearing $6. Unlike new EVs, used versions have mostly reached priced parity with gas-powered cars, according to new data from Cox Automotive — making the preowned versions the cheapest way for people to ditch increasingly costly-to-fuel gas cars in the near term.

    [...]

    New EV sales dropped by 28% year over year in the first quarter of 2026, per Cox. That was primarily driven by the loss of federal tax credits under the megabill passed by Republicans in Congress last year.

    By contrast, used EV sales increased by 12% over the same period. The reason? Declining prices. The average cost of a used EV is now within about $1,300 of a comparable gas vehicle, Stephanie Valdez Streaty, director of industry insights at Cox Automotive, said during a March forecast call. ​“That affordability shift has clearly shown up in the data,” she said, ​“significantly expanding access for mainstream buyers.”

    In the U.S., new EVs still outsell used ones. That’s likely to change as the market matures, since the overall used car market is roughly three times as large as the new car market. Right now, EVs make up only about 2% of the used car market, but that share is growing, according to Cox data.

    [...]

    These latest data points aren’t coming out of left field, said Scott Case, CEO of Recurrent, a data-science firm specializing in collecting information on used EVs. His company tracked a 35% increase in used EV sales from 2024 to 2025, as well as a consistent downward trend in pricing, with 56% of used EVs selling for $30,000 or less as of January.

    [...]

    In particular, a lot of those used EVs are coming off leases made popular by a ​“leasing loophole” that allowed automakers and dealers to offer a full $7,500 federal tax credit, without the income qualification and manufacturer restrictions that applied to claiming the credit on direct sales.

    [...]

    And the latest vintages of used EVs offer an impressive value when compared with their gas-powered equivalents, Case said. Recurrent’s latest data indicates that a used EV is a year newer and has nearly 30,000 fewer miles than a similarly priced used gas car.

    5 votes
  14. Comment on Nation's largest urban battery is being built in Daly City, California in ~enviro

    skybrian
    Link
    From the article: [...] [...] [...] [...] [...] [...]

    From the article:

    Developer Arevon has begun building a 1-gigawatt-hour storage project in Daly City, near the Cow Palace arena, where it will serve up clean energy at night.

    [...]

    Developer Arevon has begun construction of the Cormorant Energy Storage Project, which will occupy an 11-acre vacant lot just southwest of the Cow Palace in Daly City. The battery facility will be large by industry standards, with 250 megawatts of Tesla Megapack containers, capable of discharging for four hours straight, for 1 gigawatt-hour of total stored energy. Bigger batteries have been built, but when Cormorant comes online in about a year, it will be poised to be the country’s largest battery nestled within a major urban area.

    Arevon has contracted the battery for 15 years of use by MCE, one of California’s biggest community choice aggregators — entities that purchase electricity on behalf of local residents as an alternative to Wall Street–owned for-profit utilities. The state requires MCE to buy grid capacity commensurate with its members’ usage, and the Cormorant project will fulfill 10% of this annual requirement, known as resource adequacy in California bureaucratese.

    MCE has become a major force in the greater Bay Area: It now serves all of Marin and Napa counties, most of Contra Costa, and half of Solano. The aggregator can contract for power plants across California, but it looks for sites within or near its service territory when possible, said Jenna Tenney, MCE’s director of communications and community engagement.

    [...]

    The Cormorant battery provides something new: a dense source of on-demand power that can slip into the urban fabric without any local air pollution, and which absorbs the far-off solar generation at midday to discharge later at night. Arevon CEO Justin Johnson estimated that the battery, fitting on the site of a former drive-in movie theater, could cover the electricity needs of some 321,000 homes for four hours straight.

    “It couldn’t keep the whole city going, but it certainly, without a doubt, increases the reliability of the grid in that area in a substantial way,” he said.

    Arevon didn’t jump to the highest echelon of energy storage development from nothing. The firm has invested $11 billion in projects and owns 6 gigawatts of solar and battery installations operating across 18 states.

    [...]

    The company launched in 2021 as a spinout of Capital Dynamics, a private equity fund that amassed an early portfolio of energy storage assets. Arevon is owned by the California State Teachers’ Retirement Fund, Dutch pension fund APG, and the Abu Dhabi Investment Authority. Those firms invest for steady, long-term growth, and their patience lends itself to Arevon building and owning batteries for the long haul, instead of building to flip to other buyers.

    [...]

    Arevon focused on the Daly City location because electricity price volatility tends to be highest in proximity to major consumption, Johnson said. Places like that — whether metro areas or large industrial hubs — see the greatest swings from peak to off-peak hours, and having battery facilities to arbitrage between those times should push prices down in the long run. But building within a city comes with obvious trade-offs.

    “Siting any infrastructure, whether you’re putting in a Walmart or upgrading an intersection or doing anything in a high-density area, is tough … especially so for power plants or facilities like this,” he noted.

    [...]

    Such projects ​“reduce your lifespan a little bit” from the stress, Johnson said, but once built, the intrinsic difficulty becomes a sort of strategic moat. If a competitor wanted to open up next door to Cow Palace, well, they probably couldn’t find a viable space.

    [...]

    Arevon’s choice of battery, Tesla’s Megapack 2 XL, addressed the safety question. The containerized storage product is filled with the lithium-ferrous phosphate cells, a battery chemistry known to be significantly less fire-prone than earlier lithium-ion varieties. The older Moss Landing facility packed a huge amount of batteries into a single legacy structure, where they became fuel for an immense conflagration. The Megapack containers, in contrast, will be spread out across the site in a design that will prevent a fire from spreading beyond a single metal box. If one unit ever did catch fire, it would damage only a fraction of 1 percent of the plant’s capacity.

    2 votes
  15. Comment on US imports more from Taiwan than China for first time in decades in ~finance