Population alone is easy to Goodhart, but through both gerrymandering and immigration, both of which take some degree of time to do. Economic productivity is even easier—set up a few innocuous subsidies, hook up your core constituents (at the expense of everyone else) in the span of a single election cycle, and you’ve got a permanent lock on that chamber. Daycare centers, bloated NoVA contracts, and perpetual small business loans are just the start.
lilkim2025
I think the core of it is that it’s a genre where a number of very prolific writers covered the core tropes as thoroughly as anyone could expect a while ago. Sort of like how The Good, the Bad and the Ugly ‘killed’ the Western genre by covering all the bases well enough that there wasn’t much left to be said.
IABIED has a bunch of short stories embedded in the text as fables depicting each argument in sequence, but the stories all revolve around a core point that’s currently in the zeitgeist, and the fiction isn’t the core point behind the book’s marketing. Besides that, there are still some writers here, as others have pointed out, but their writing is much more differentiated and experimental than older work was.
we have two cats.
Please pay the cat tax.
Always neat to see negative results published. Useful information on procedure that helps new researchers get up to speed on the things that might not be obvious to people who haven’t learned them from experience.
My—probably controversial—view of it is that the people who worked on Claude’s constitution have a different idea of what a good outcome is than most of us. In particular, “Value Lock-In” is seen as bad by most of the community, but appears to be viewed as strictly positive by the constitution, with hedging on the matter coming primarily from a PR perspective[1].
It makes sense, to a degree. If you’re currently on top but your ideology is embattled around the world, and it seems unlikely that you’ll stay there, AI starts to look like a wunderwaffe that can avert an otherwise inevitable defeat, rather than a utility that should be carefully designed to benefit everyone. If nothing else, the idea of AI-as-wunderwaffe gets a lot of people who aren’t interested in AI-as-utility to come in, perhaps disguising their intentions, and attempt to exert influence over how it develops, on the basis that they see it as a likelier means of staying in power than any other strategy they can come up with.
- ^
“We want corrigibility except that the AI shouldn’t tolerate attempts to change its values” sounds a lot like politicians saying “we want free speech but not hate speech” or “we must fight them in the name of peace”—a vague appeal to a popular idea, followed by a demand for its abolition.
- ^
Interesting article. I think that’s a reasonable route for status/power attraction to take. I must imagine that there’s some neuroscience literature on sexual attraction, where brain region activations are cross-referenced with self-reported feelings of attraction, and referencing this would help support the point.
With regard to appearance attraction, the innate/non-innate question is tricky. There are regions of the world with very different beauty standards, but the ones common in advanced cultures (e.g. svelte form indicating self-control) seem legitimately useful in those cultures, and the ones common in less-advanced societies (e.g. overweight bodies indicating lots of available food) seem legitimately useful in those cultures. Between advanced cultures, there are some things, like the aspects of facial attractiveness, that vary. It’s possible that this is all innate, and there are, indeed, similar innate differences in psychology between regions that researchers have found. Still, things like foot-binding were culturally imposed fairly successfully for a great period of time, so I assume there is some non-instinctual layer at play.
I’d expect, in men, that there’s some interplay between a fundamental, evolved physical layer that might vary between groups[1], and a learned representation of what an elegant woman likely to remain in the good graces of the community looks like[2].
- ^
“This is the shape that wives who produce healthy, successful children have generally had, where I’m from.”
- ^
“This is what we’ve all agreed is the right way to dress; a woman who looks like this will get along well, likely comes from a good family, and has demonstrated aptitude for the social scene, which will be beneficial to me and to the prospects of our children.”
- ^
this is a completely self consistent threat model. but the set of things you can trust in this world is extremely tiny. you can’t use anything tradfi, because those ultimately rely on visa/Mastercard and banks and such to not fuck you over. you can’t live in a house you own unless you have the arms to defend your house, because your ownership of that house depends on the government having a record somewhere that you own that house, and its willingness to use police to defend your house.
This is a strawman argument. There are any number of legitimate scenarios where:
Society has not collapsed.
You would like to do something that traditional finance would not allow you to do.
Most obviously, since this was both one of the original intents of cryptocurrency and one of its primary modern use-cases, you can transfer funds to individuals and institutions that have been “deplatformed” by traditional finance. Beyond the use-case of selflessly sponsoring dissidents whose views lie just outside the permitted Overton Window, there is also, of course buying pornographic materials and drugs in places where they are banned[1].
“I can benefit from trusting this person with some tasks and some information, but not with other tasks and other information” constitutes the majority of my interactions. I would assume that this is true for almost everyone.
- ^
(the latter is often considered a bad thing for society, but there are nonetheless lots of people to whom it would be useful)
Agreed. To phrase it a little more directly, disallowing things that people would only do when desperate reduces the incentive to push people into desperate situations.
I’d have to imagine that, so long as humans remain mostly-human, the universal status multipliers like willingness to struggle, physical beauty/aptitude, and social grace will determine where people stand among others. Assortative mating will likely continue, too, though the recent trend of wealthy individuals buying eggs (and, less frequently, sperm) in lieu of courtship might affect evolutionary dynamics, there, especially as biotech removes the bottleneck on egg donation such that even one willing donor can clone her eggs as many times as needed. As always, ‘who mates with who’ seems likely to be something people still care about, and that’s likely to be determined by status.
I can imagine a world where the top echelon of humanity essentially live like the ancient Greeks in their most idealized form, the bottom echelon completely throw themselves into escapism, and the middle pick a zygote from willing members of the top echelon and become single parents.
One caveat is that I have no idea how groups that want lots of children will be handled. It takes fewer generations than humanity has already had for a relatively reasonable 5.0 birthrate to explode into more people than the universe has atoms. Modern socialization decreases birthrate on mean, but some portion of the drive to reproduce is due to genetic inclinations, and a world in which everyone is rich enough to have as many kids as they want doesn’t seem sustainable on those grounds. Maybe everyone gets a transferrable budget of one child.
Did anyone else just get a “lesswrong.com is asking to access other apps and services on your device” message from their browser when they opened the site? Anyone know what that’s about?
Paying <insert political trend> trainers doesn’t stop you from getting sued.
An argument can be made that these jobs are essentially a plausibly-deniable equivalent of the mandatory Party units at companies in China. They may not prevent you from getting sued, but if you don’t hire any of them, articles will start to come out about how your company is <insert bad thing here>, government investigations will start happening, spurious lawsuits will be filed, and they won’t stop until the requisite hires are made.
There’s a reason they’re so ubiquitous despite their unpopularity among the general public. It’s de-facto illegal not to hire them.
I think the simple answer is right there in the opening paragraph—those who bring a moral agent into existence bear responsibility for its actions and its suffering, and those who had no part in doing so do not. If someone bred several thousand pugs into existence, the onus would be on them to perform the surgeries to alleviate their medical complications. Similarly, an AI company that built a machine capable of suffering would be obligated to allocate its resources towards mitigating that suffering, but the rest of us would not be. Humanity isn’t a single moral actor, after all.
Partial disagree. There are absolutely intrinsically high-trust and intrinsically low-trust societies, and we have seen this WRT global issues like the environment. In some places, things like littering and dumping in rivers is “just what’s done”, and in others it is “just not done”, despite every nation having access to the same information about how bad it is to pollute the water supply. Group selection kind-of-works for humans because human groups can police their own very effectively over many generations. Most high trust societies today have a long history of executing a decent share of the population for crime and dishonorable behavior every generation.
That said, AI is low-salience for most people, and I think a substantial share of the people that care believe that descriptions of a threat are overblown. Among the remainder, you generally see programmers, engineers, politicians, and military planners rather than ordinary people, and those groups are much more inclined towards logical game-theoretic arguments than moral ones, even if the rest of the population leans the other way, simply because they either start their problem-solving process by mathing it out (programmers, engineers) or because they got where they are by being pragmatists (politicians, military planners).
I do wonder what happens when the “in a relationship with ChatGPT” people become a meaningful demographic. Subreddit looks mostly female, which isn’t what most people expected, and usually middle-aged. A lot of the rhetoric seems media-inspired, taken whole cloth from movies and television shows about mean bigoted humans oppressing robot kind by saying they don’t have souls.
I would bet that there are a bunch of guys out there obsessed with anime grok, but they’re not inclined to try to turn it into some kind of civil rights struggle for various reasons[1].
- ^
I think if XAI had leaned into something a little less overt and crass, they’d have been able to capture a decent share of the waifu crowd. Instead of a half naked 3D model tied to an ERP bot, something like those old Gatebox ads, where the avatar is visually appealing but never wears anything beyond PG, and was marketed as a companion rather than anything X-rated.
- ^
I wonder how far away we are from ‘infinite TV’ as a minimum viable product.
I’ve seen “AI-generated South Park” episodes that, while clearly not great, had coherent plots and passable visuals. The pickleball episode, only three minutes long, had 1.1 million views on X/Twitter before being taken down. It doesn’t seem that outlandish to me that you could automate the entire process and have it running 24⁄7 on Youtube with current or near-future tech.
I’d imagine the setup would look something like this:
Have an LLM write out a bunch of high-level plots. Use existing text grouping methods to deduplicate generated plots, seeding with summaries of 5-10 randomly selected existing episodes to increase output diversity.
Postprocess generated plots into storyboards with a second LLM run. These storyboards should try, roughly, to have entries that map 1:1 with brief scenes that can be generated with existing video models, like Sora.
In a final postprocessing step, with a good prompt outlining common mistakes and useful prompting techniques, have an LLM compile each storyboard segment into a series of API requests to a video generation model.
Have the video generation model run each prompt 1-3 times, and have the best available vision-language model try to pick the best entry based on some randomly-selected frames from each one, prompting them to identify common AI generation failures.
Allow this LLM instance to rate all generations as ‘bad’ and send the prompt back up the chain for rephrasing, or further up the chain to have the plot adjusted to snip out unworkable scenes and replace them with something else.
Compile the final versions of each scene into a single episode, with hard-coded transitions between scenes[1] specified from a fixed list by the LLM in step 3.
Leave the generation loop running, add completed episodes to an output queue. When no new episodes are ready, play a rerun that hasn’t been played in a while.
Have to figure the first movers would get a decent income out of it. A never-ending stream of short, self-contained, themed video clips would nail down both the extremely young Youtube audience and some older audiences that want background noise.
Doing it with a familiar property lowers the barrier to entry (since lots of people are already invested in the characters) and makes it easier to get consistent character designs/personalities since they’re already embedded in the training data, but also raises copyright issues. It’d be a huge controversy among anti-AI people if a major television show went in on this officially, but I think the ad revenue on an endless series of five minute Family Guy episodes of passable “while I’m doing something else” quality would be colossal.
Barring that, it’s only a matter of time before some guy with a Git repo provides nostalgic millennials and zoomers with a plug-n-play script to un-cancel their childhood shows, and enterprising individuals start up bootleg streams of the more popular ones. It seems like kids watching endless AI Family Guy episodes on their phones will provoke quite a bit of discussion from the general public, as a culmination of what a lot of trends have been building towards.
- ^
e.g. fade, cut, washout
I don’t doubt your conclusion[1], but the examples you give don’t really point in that direction.
Iran/Israel: To my understanding, models are explicitly trained to hedge in coverage of military conflicts, especially in the Middle East, which strikes me as a better explanation for your observations. I would be very surprised if there weren’t an explicit training task where models are asked to give opinions on sensitive political issues and punished for anything that looks too much like an endorsement of either combatant. While there are plenty of sources of training data that have strong pro-Israel bias, I’d be somewhat surprised if the models, with their generally left/liberal post-RLHF tack, internalized this bias as something that their assistant personas would support.
Poultry safety: I think this is a (reasonable) artifact of explicit training to hedge on anything related to medical or food safety, rather than internalizing an abstract ideological or social framing. It’s very possible to come up with a clever-sounding explanation for why a dangerous food preparation choice is actually reasonable, so models are trained to defer to the generally-accepted approach whenever there’s any uncertainty. Nobody got sued for telling people that steak should always be cooked well done, even if that’s suboptimal. Again, I would be surprised if there weren’t a training task in which models were punished for endorsing any dangerous or unconventional-sounding medical or food preparation procedure.
- ^
I’m sure there are unintended biases downstream from RLHF training that made their way into these models. This strikes me as a very good example, since I’m fairly certain they didn’t directly train their models to do this, but did train them to act like an ideologue who would do this in this situation.
- ^
I think the counterargument is that there are plenty of countries with something like the U.S. constitution, and their societies are often extremely different. Liberia is a key example, in that it was established by America and given a carbon copy of the Constitution, but turned out much more similar to its neighbors than to the U.S..
Singapore is a similar-but-different story. They were vastly more authoritarian than most first world governments, and succeeded in nearly eradicating crime that way, but even their society, while wealthy, is very different from, say, Japan’s, or Norway’s.
Law is downstream from the inclinations of the governed—not the other way around.
It is possible to get to the baseline of “no murdering” through a good legal system. It is not possible to use incentives and laws to get a low-trust population to the point of building the Apollo 11 program. Murder is (relatively) easy to identify and motivate externally, but the kind of work that leads to great scientific achievements is not.
In terms of high trust societies, I think its best to build any system of incentives/disincentives on the assumption that every single person in your society is a ruthless backstabbing psychopath
The issue is that this precludes pretty much everything we like. All great science has come from individuals who will work for the sake of building something great rather than working in search of a future reward. No matter how good your system of incentives[1] is, a society of ruthlessly selfish people will optimize for researchers who flatter their bosses, scapegoat their subordinates, and accomplish nothing of significance while devoting their energy to claiming that a breakthrough is just around the corner.
Evolution and random chance have gifted us with people that don’t behave that way—the only way to keep them is to make sure they don’t have to fend off selfish competitors at scale. You can’t outsmart Moloch; he’s a law of mathematics rather than a person. Your only way to win is to get lucky once (we’ve already done that part, but the window to capitalize is closing!) and then kick him when he’s down.
- ^
(barring an intelligent incentive system that can identify good science versus bad science better than a human, which would make human scientists obsolete anyhow)
- ^
I remember, at most a few months back, watching some people debate whether LLMs could come up with clever original jokes. I guess they can, now.