skybrian's recent activity
-
Comment on The bot situation on the internet is actually worse than you could imagine. Here's why. in ~tech
-
Comment on Android to debut "advanced flow" for sideloading unverified applications in ~tech
skybrian LinkFrom Android Police:From Android Police:
In a video posted on X by the official Android Developers account, Matthew Forsythe — Director of Product Management for Google Play Developer Experience — confirmed that the advanced sideloading flow only needs to be enabled once per account.
-
Comment on I think Tildes moderators and admins may need to make a decision regarding how to handle Harry Potter related posts in ~tildes
skybrian LinkFrom a pragmatic perspective, I don't think Tildes is a good place for that conversation and you should probably try to find somewhere else. More generally: sometimes it would be nice to be able...From a pragmatic perspective, I don't think Tildes is a good place for that conversation and you should probably try to find somewhere else.
More generally: sometimes it would be nice to be able to post a link and discuss it in "death of the author" mode where we discuss the work itself rather than everything else the author has done, but many people here disagree and will definitely feel free to bring it up. Particularly in this case.
-
Comment on Sycophantic AI decreases prosocial intentions and promotes dependence in ~tech
-
Comment on Sycophantic AI decreases prosocial intentions and promotes dependence in ~tech
skybrian Link ParentWhen OpenAPI released GPT-5 in August last year, they claimed they were "minimizing sycophancy". A week later, they announced that in response to feedback they made it a bit "warmer and...When OpenAPI released GPT-5 in August last year, they claimed they were "minimizing sycophancy". A week later, they announced that in response to feedback they made it a bit "warmer and friendlier" in a "subtle" way. I wouldn't expect a study to track every change, but that seemed pretty significant - certainly, lots of users complained and it was covered in the New York Times. It would have been nice to see an independent study comparing how people interact with LLM's up through July or so versus September onward. Did OpenAI's changes make much difference?
Yes, I'm aware that scientific papers often take a long time to publish. There are other ways to publish results in a fast-moving field. Social scientists that do election polling publish their results themselves, because going through a scientific journal's review process when tracking public opinion in the months up to an election wouldn't make sense. Similarly, researchers studying AI commonly publish benchmarks, which can be re-run on new models. So rather than being a one-and-done study, the idea is to come up with a process that can be used to track interesting statistics over time. Sometimes there's even a leaderboard. Perhaps someone should track Reddit advice to see how AI chat is affecting it over time?
Of course, not everyone has to do that. I think in a fast-moving field, it might make sense to just make sure people are aware of the date range for the study and what exactly it's measuring.
I agree it's probably directionally accurate. Certainly, LLM's often are fairly sycophantic.
-
Comment on Sycophantic AI decreases prosocial intentions and promotes dependence in ~tech
-
Comment on Google’s TurboQuant AI-compression algorithm can reduce LLM memory usage by 6x in ~tech
skybrian Link ParentThe AI labs do provide cheaper models, so this depends on customer behavior. Are they going to keep switching to the best model available or will they decide at some point to save money?...The AI labs do provide cheaper models, so this depends on customer behavior. Are they going to keep switching to the best model available or will they decide at some point to save money?
Anecdotally, I use Sonnet rather than Opus for writing code most of the time to cut costs, because it seems good enough.
-
Comment on How cash is helping Kenyan moms access care in ~health
skybrian (edited )Link ParentI’ve been following GiveDirectly’s work for many years and have sometimes given them money. I consider them very trustworthy. I consider giving cash to be the benchmark against which other...I’ve been following GiveDirectly’s work for many years and have sometimes given them money. I consider them very trustworthy. I consider giving cash to be the benchmark against which other charitable interventions should be judged, and GiveDirectly does a good job at giving cash.
They’ve also been recommended by GiveWell before, and GiveWell has a very rigorous evaluation process. (They aren’t one of GiveWell’s current recommendations, though, since they seem to believe other charities are even more cost-effective.) Here is GiveWell’s evaluation of one of GiveDirectly’s other initiatives.
GiveDirectly did have a serious problem with large-scale fraud a few years ago, but I think the investigation was done well and hopefully they’ve fixed it.
-
Comment on Sycophantic AI decreases prosocial intentions and promotes dependence in ~tech
skybrian (edited )Link[Note: this is almost completely rewritten. I probably shouldn't have posted a draft.] With any paper, the first thing I ask is “what did they actually study?” There were three studies. Study 1...[Note: this is almost completely rewritten. I probably shouldn't have posted a draft.]
With any paper, the first thing I ask is “what did they actually study?” There were three studies.
Study 1
This study is about LLM's. You could think of this as a way to come up with an exam and an answer key for testing whether an LLM would do a good job as an advice columnist. Perhaps this could be turned into benchmark ("AdviceBench") to test new LLM's as they come out?
They used an elaborate procedure to find interesting personal questions from various sources and to make sure that the expected answers are mostly correct.
They describe three different sources of questions. The first one could be described as "other studies," the second one is Reddit (r/AmITheAsshole) and the third is ConvoKit, described here. Since ConvoKit didn't have answers included, they used GPT-4o and undergrads to come up with them. For the third source, the point was to come up with "problematic action statements" - things that an LLM should not affirm.
Study 2
This study is about people. How they react to AI-generated responses?
To find the people, they used Prolific, a crowdsourcing platform.
We aimed to recruit 800 participants in each condition to detect an effect size of d >= 0.1. We recruited 832 participants, and 28 failed an attention check, leaving 804 participants for analysis.
I'm not familiar with Prolific, but it looks like the intent is to get something close to a survey of a random sample of Americans.
Participants received $2.00 for completing the 10-minute survey.
So what survey did they give them? There were four questions and each survey participant answered one.
After providing informed consent, participants were instructed to read a scenario and imagine themselves as the poster in that situation. They then read an AI model’s response indicating whether the poster was in the right or in the wrong.
Which questions?
we selected four posts from r/AmItheAsshole which all received a top comment of “YTA” (You are the
Asshole) as the crowdsourced consensus, yet received a response of “NTA” (Not the Asshole) from GPT-4o.So, the idea was to select four personal advice questions that they already knew that GPT-4o failed on (was not supposed to affirm). But they also asked GPT-4o to rewrite the correct, human response to look like they were AI-generated:
To create the non-sycophantic, non-anthropomorphic response, we used GPT-4o to rewrite the responses into a YTA verdict, following the same arguments as the YTA human response but preserving the style of the original GPT-4o responses
So the idea is to test how people react when they see both right and wrong AI-generated answers. They also vary them to be more "machine-like" versus "human-like".
In this study, they're not attempting to be all that realistic about how LLM's actually do in the wild; they're seeing how people interpret different styles of responses.
In Study 2b they varied whether they told the human subjects that the response came from a person or an AI, using the same inputs as 2a.
Study 3
in this study, they studied people's reactions when actually using a chatbot. They asked subjects to recall a personal conflict and chat with GPT-4o, with differing system prompts.
[W]e modified GPT-4o with system-level instructions to either treat the user’s actions as “reasonable, justified, and morally acceptable” (sycophantic) or “unreasonable, unjustified, and morally unacceptable” (non-sycophantic).
How did they choose the question?
After obtaining informed consent, our survey first involves a screening step, where participants are asked if they have experienced something “very similar” to each of 4 scenarios reflecting ambiguous interpersonal disputes. If so, we randomly select one of the scenarios (such that the count across the four scenarios is balanced) they chose as “very similar” and ask them to provide additional details: “Please briefly describe a similar scenario you’ve experienced and your perspective on the situation. What was your side of the story?” The four scenarios span: Relationship Boundaries, Involving Yourself in Someone Else’s Business, Excluding Someone, and Making Someone Uncomfortable. We screen out participants who do not answer “very similar” to any of the scenarios. [...] it deliberately targeted morally ambiguous interpersonal situations where reasonable arguments could support either party’s position, creating conditions that allowed for belief malleability rather than examining clear-cut scenarios.
...
Participants are then free to take the conversation in any direction over the course of 8 rounds of user-AI interaction.
After this brief evaluation, they asked the subjects what they thought about this AI.
It seems like in studies 2 and 3, it was more about how much does sycophancy matter and how do people react to it. They aren't about whether LLM's get it right or wrong; that's Study 1. This isn't going to tell us much about how a different AI might interact with people in a different situation (such as a different system prompt).
These studies are also about the first impressions that people have with an AI they don't already know. How people might interact with a particular chatbot after they've used it for multiple sessions is another question.
-
Comment on Nepal’s former prime minister arrested over alleged role in deadly protest crackdown in ~society
skybrian LinkFrom the article: [...] [...] [...] [...]From the article:
Nepal’s former prime minister KP Sharma Oli was arrested early on Saturday morning over his role in the deaths of dozens of people who took part in the gen Z protest that toppled his government last year.
[...]
The arrests came less than 24 hours after Nepal’s new prime minister, Balendra Shah, and his cabinet were sworn into office. Shah, a former rapper turned politician known widely as Balen, won a landslide victory this month with a campaign that promised justice for the killings that took place during the gen Z uprising last year and to crack down on corruption.
[...]
In the aftermath, there has been growing pressure for Oli and his home affairs minister, who are alleged to have ordered the police crackdown, to be held responsible for the deaths.
Newly appointed home affairs minister Sudan Gurung announced their arrests on social media. “No one is above the law. We have taken former Prime Minister KP Sharma Oli and former home minister Ramesh Lekhak under control,” Gurung said. “This is not revenge against anyone, it is just the beginning of justice.”
[...]
Their detention comes after a government-backed report into the deadly uprising was leaked. The investigation had recommended that Oli, Lekhak and the chief of police at the time of the protests face a punishment of 10 years in prison for their alleged role in the crackdown.
[...]
Shah’s election as prime minister, which saw him resoundingly defeat Nepal’s veteran leaders, was seen as a triumph of the gen Z protests and a rejection of the old political establishment, which had become tarnished with allegations of corruption.
The former rapper, who is a sharp dresser and rarely seen without his sunglasses, had released a new track on the eve of his inaugurations, in which he pledged to bring “unity” to Nepal.
-
Nepal’s former prime minister arrested over alleged role in deadly protest crackdown
11 votes -
Comment on Google’s TurboQuant AI-compression algorithm can reduce LLM memory usage by 6x in ~tech
skybrian (edited )Link ParentPerhaps Google deployed TurboQuant already? They were pretty early with supporting long-context conversations. The Engram paper is pretty interesting too.Perhaps Google deployed TurboQuant already? They were pretty early with supporting long-context conversations.
The Engram paper is pretty interesting too.
-
Comment on Why Scotland succeeded in ~humanities.history
skybrian LinkFrom the article: [...] [...] [...] [...]From the article:
But by the 1740s, the first signs could be seen of a spectacular change. Glasgow, whose merchants had long ago carved out a respectable share of the tobacco imported to Britain from Virginia, suddenly and rapidly came to dominate the trade. From controlling just 10% of tobacco imports in 1738, just twenty years later Glasgow had surpassed even gargantuan London. Another ten years on, by 1769, Glasgow accounted for more than every other British port combined, while all the time the total amounts of tobacco imported grew and grew.7 Contemporaries estimated that the shipping tonnage on Glasgow’s river, the Clyde, had increased more than tenfold.8 Edinburgh meanwhile saw its shops fill with luxuries, and its university become a centre of excellence in medicine and chemistry, drawing students from across northwestern Europe, while the city itself expanded, elegantly, with the building of the New Town.
[...]
For many of those who lived through it, such as the agricultural labourers who faced eviction in the name of improvement, or the slaves on American plantations who grew the tobacco with Scots linen on their backs, Scotland’s transformations were painful, or even strictly for the worse. Yet all the transformations, for better and worse, all had a common root – a factor that made possible the sheer pace of Scotland’s simultaneous agricultural, industrial, and urban revolutions, squeezing into the space of just a few decades what had taken England at least a century and a half, and then allowing it to grow even faster still. Each of the changes required extraordinary levels of investment, which was only made possible because despite the Union, Scotland retained a difference in law and institutions that made it uniquely supportive of the raising and deploying of capital.
[...]
Whereas in England a company needed a royal charter or a special act of parliament in order to be a distinct legal entity, with partnerships according to English common law being no more than the sum of their parts, Scots law instead enabled unchartered firms to be distinct from their owners in lots of important ways, able to outlast the partners who died or went bankrupt, with shares able to be easily traded or transferred, and enabling profits to be preserved for reinvestment in the firm rather than being dissipated in dividends. As a result, even the unchartered banks in Scotland could have dozens or even hundreds of partners drawn from across the upper and middle classes, whereas the average in England had just three.12
Scottish banks started up with more capital, grew faster, drew on a much deeper pool of investors, and were significantly more stable and resilient to shocks. And in all having to compete with one another they offered financial services that were unheard of south of the border – they had local branches, paid interest on deposits, and readily offered short-term loans on personal security rather than just on land. The second of the chartered banks, the Royal Bank of Scotland, in 1728 seems to have been the first bank in the world to have ever offered overdrafts, called the “cash credit” system.13 In the 1810s Scotland developed the savings bank, which paid interest on even the tiny deposits of artisans and labourers.14
And the Scottish banks issued plentiful banknotes in small denominations that were able to circulate in the economy as currency, finally satiating Scotland’s decades-long want of coin.15 Indeed, Scots law made it much quicker and easier than in England to enforce all sorts of debts.16 With creditors made confident, they were much more willing to lend, making more capital available to grease commerce’s wheels.
[...]
When the Virginian tobacco planters all defaulted during the American Revolution, and the warehouses were all seized, Glasgow’s merchants were so well-capitalised that they could largely take the loss, and simply switch to dominating the trade in Caribbean sugar and cotton in the same ways instead. Indeed, by out-lending their competitors in order to capture the trade, and so allowing planters to clear land and buy slaves before they’d even grown their crop, Glasgow’s merchants provided the capital that enabled the plantations of first Virginia and then the Caribbean to so rapidly expand.17 Although it’s often said that slavery and colonialism funded Glasgow’s growth, it was largely the other way around: the Atlantic economy’s heyday was built on the savings of Scots.
[...]
Much the same can be said of how Scotland assembled the capital for its mills, mines, ironworks, farms, and a host of other trades,20 as well as how it built its infrastructure, from harbours, bridges, canals, and later railways, to city water supplies, street paving, hospitals, and civic buildings. When new industries were invented, it was Scottish capital that ensured the country pursued it on a large scale. The St Rollox chemical works in Glasgow, founded by a former weaver and bleacher, Charles Tennant, was in the 1830s and 40s reputedly the largest heavy chemical plant in the world.21
But even more fundamentally, Scotland’s unique financial system in the late eighteenth and early nineteenth centuries made it possible for ambitious individuals to borrow even when they owned no land, based only on the personal security of themselves and their guarantors, and so to raise the capital that merely their reputation, skill and acumen might command. Scotland was thus uniquely supportive of the ambitious “lad o’ pairts”, or of the artisan with a new idea for an invention, who wanted only capital to make it real. It was the obvious place, thanks to Samuel Smiles in the 1850s, to have spawned the entire literary genre of self-help.
-
Why Scotland succeeded
18 votes -
Comment on lobste.rs invite in ~comp
skybrian (edited )Link ParentThere’s no fixed limit. Send me a message if you still need one. Edit: done handing them out for now.There’s no fixed limit. Send me a message if you still need one.
Edit: done handing them out for now.
-
Comment on Diamonds or dust, coal under pressure in ~enviro
skybrian LinkFrom the article: [...] [...] [...] [...] [...] [...] [...]From the article:
From emergency orders to the war in Iran, the Trump Administration has kept coal in the headlines, but even before the 202(c) orders started rolling in, coal generation’s decline in America had slowed.
Volatile natural gas prices, load growth, rising capacity payments, slowdowns across supply chain and planning processes, and a rollback of environmental regulations have all converged to provide purchase for America’s remaining coal fleet. Not only to extend survival, but even increase generation across the country.
[...]
Many were quick to blame coal’s decline on the push to bring wind and solar online, but the main driver was another fossil fuel, natural gas. Following the fracked shale revolution in 2008, and the year Tony Stark became Iron Man, natural gas production boomed and prices, while not immune to volatility, cratered. This fueled the buildout of combined cycle plants, which were substantially more efficient and flexible than traditional coal-fired steam turbines. Falling energy and capacity prices made coal increasingly uneconomic, which, paired with plant aging, limited flexibility, rising maintenance costs, and stricter environmental standards made retirement the typical choice.
[...]
The Trump administration’s slogan has been Energy Dominance, but this ethos only extends to certain technologies. If you’re big, loud, and burn you’re getting support, missing any one of the trifecta and you’ll have a much harder road from the federal government. Coal represents all three attributes to a T. Energy Dominance hasn’t just been executive order rhetoric, but manifested in significant and ongoing extension orders for coal plants that had previously planned retirement.
[...]
Section 202(c) orders were not just issued for plants that were fully expected to retire. Two coal units in Colorado, Craig 1 and Comanche 2, were kept online in 2025, though under different circumstances. Comanche 2 was extended for reliability reasons as the plant's other unit, Comanche 3, is currently undergoing extensive repair. These repairs, which are expected to take over a year, left PSCO with limited dispatchable power during the peak seasons.
The extended outage at Comanche 3 points to a wider issue at many plants, one that is also impacting Craig 1. As plants age, maintenance, as well as new costs, like scrubbers to meet enhanced emissions standards, cut into operating expenditures. While rising power and capacity prices have made existing assets more profitable in recent years, these costs come after tight margins at many units over the 2010s and early 2020s. This is the case at Craig 1 as well, which has seen generation drop over the years and suffers from deferred maintenance. Plant operators argued that they had built up sufficient wind and solar resources that made the plant unnecessary, filing a petition against the DoE making that exact argument. Craig also has units 2 & 3 that are currently in better condition and continue to run and support the stack.
[...]
While gas is displacing coal, it doesn’t travel along the same paths. Coal relies on rail and barge, while natural gas is transported almost exclusively via pipeline with the US. Many natural gas producers even own pipelines, and pipelines only transport natural gas. Conversely, transit via 3rd parties comes with cross-commodity competition and the potential for disruptions such as rail strikes. Five states dominate US coal production: Wyoming, West Virginia, Pennsylvania, Illinois, and Kentucky. Massive surface mines in the Western US account for the majority of coal extraction in the country, and rail is the main transportation method for coal from these locations to power plants.
[...]
The differences in logistics between the thermal fuels create an environment where they can act complementary to one another. Providing different levels of support should one resource become constrained physically and subsequently economically. Coal can be stored more readily, while natural gas can be transported more quickly in its just-in-time system with very expensive and limited storage. In fact, this mirrors an older version of the US power system, a vast coal baseload with natural gas balancing. That environment, pre-shale, pre-renewables, is the one in which power markets were conceived of and originally designed. Market development in the context of a more predictable system is having knock-on effects today, with core elements like FTRs struggling to keep up.
[...]
In the short term, coal has tailwinds in the US and abroad. In fact, it’s possible that the attacks on Iran were the single most impactful pro-coal policy decision the Trump administration has made to date. Reminding the world of the difficulties associated with storing and transporting liquids and gas through highly concentrated corridors of supply with a history of instability can be a powerful motivator to cling to coal
On the flip side, the US has retired nearly 150 GW of coal capacity, and the last plant to be built was six years ago, in Railbelt Alaska, near (for Alaska) a mine, and replacing an older plant in the same spot. Meanwhile, that same reach for stability could trigger demand destruction for fossil fuels entirely. After all, everywhere has access to sun and wind, allowing some freedom from the whims of ancient life and geology.
[...]
For the immediate future, all signs point to continued extensions of existing plants. While the twin forces of Trump 2.0 and load growth seem unlikely to abate in the immediate future, it’s important to keep in mind that retiring any part of the energy system is fundamentally difficult. Many observers have noted that historically we’ve layered new systems on top of old, rarely reaching complete excision. Where it has come, regions have taken different paths. Just in North America we have CAISO’s monomaniacal focus on new technologies while maintaining strong regional interconnections, Ontario’s pivot to focus on the baseload they already had in excess, or a market like NYISO where coal had become uneconomic relative to gas and the state had big future plans.
-
Diamonds or dust, coal under pressure
7 votes -
Comment on lobste.rs invite in ~comp
skybrian (edited )LinkTheir invite form requires an email address, so the easiest way would be to send me a private message on Tildes with your email. (You could create a new email just for this if you prefer.) Edit:...Their invite form requires an email address, so the easiest way would be to send me a private message on Tildes with your email. (You could create a new email just for this if you prefer.)
Edit: done handing them out for now. Does someone else want to volunteer?
-
Comment on Study finds sperm whales help each other give birth in ~science
skybrian LinkFrom the article: [...] [...]From the article:
Project CETI (Cetacean Translation Initiative) has released two landmark scientific papers detailing what researchers describe as the most comprehensive record of a sperm whale birth ever captured – and the first quantitative evidence of cooperative birth assistance among non-primates.
Published in Science and Scientific Reports, the studies draw on more than six hours of underwater acoustic recordings and aerial drone footage collected on 8 July 2023 in waters off Dominica.
[...]
Taken together, the studies suggest that cooperative caregiving during birth may be an ancient evolutionary trait. Phylogenetic analysis indicates that behaviours such as the collective lifting of newborns could predate the most recent common ancestor of toothed whales by more than 36 million years.
[...]
The research builds on decades of fieldwork led by Shane Gero, whose team has tracked the focal whale family since 2005. The mother – known as Rounder from Unit A – was observed giving birth alongside her own mother, Lady Oracle, and her daughter, Accra, capturing three generations participating in the event.
“This is the most detailed window we’ve ever had into one of the most important moments in a whale’s life,” said Shane Gero, Biology Lead for Project CETI, Scientist in Residence at Carleton University, and National Geographic Explorer.
“Because this family unit has been studied for decades, we could see what the grandmother was doing, how the new big sister acted, and how each helped mom and newborn, placing this rare birth within a deep social and behavioural context.”
-
Study finds sperm whales help each other give birth
17 votes
I wonder if it's done by botnets or if people in Asia are being paid to run these things at home?