I think it’s useful to have arguments that appeal to folks all across the political landscape, and I like this framing. I often use something like “think about how much variation there is oven ‘human nature’, and just how good or bad it can be; an artificial intelligence will have an artificial nature, and could have behaviors much weirder than we imagine”.
Interestingly, this seems to bite harder amongst conservatives and those with a religious worldview; they often have a dim view of human nature in the first place, and I think it gets them thinking about “something even worse than ‘made in the image of God’”. I hope this continues to be helpful, since AI is now becoming polarized.
dan.parshall
As I said in the original comment, I can certainly imagine minds that have other goals; possibly I have just overinterpreted statements like:
> If we consider a space of minds a million bits wide, then any argument of the form “Some mind has property ” has chances to be true and any argument of the form “No mind has property ” has chances to be false.
To imply that the distribution over such minds is likely to be uniform. Whereas it seems our current methods, using imitation learning, are at least definitely not sampling from that space uniformly. Overall this makes me more optimistic that alignment may be tractable.
Nice review. One thing you didn’t directly address, but which has struck me learning more about AI training, is that the Orthogonality Thesis… doesn’t actually seem to be true? I mean, yes, I could imagine intelligences that loved other things for no reason, but the intelligences we seem to actually be making seem to be not insanely orthogonal! (although still far from perfectly aligned, but I’m hopeful nonetheless)
I appreciate you putting this here! I realized no one had ever archived the original, so I’ve done so. The permanent link is at
https://web.archive.org/web/20260419231740/https://mylordshesacactus.tumblr.com/post/813939696352772096/please-make-a-post-about-the-story-of-the-rms
Name one other place or time in history where it was illegal to teach literacy!
Here are 3:
- 1700s Ireland, it was illegal for Catholics to operate schools, teach, or send children abroad for education
- In Khmer Rouge Cambodia, all of the intelligentsia were executed and schools closed
- in Taliban Afghanistan, women have no ability to learn (beyond ~3rd grade, IIRC)
None of which makes the Antebellum South in good company, but I do want to push back on the commonly-held perception that it was uniquely bad; truly there is no new thing under the sun!
Their brains are highly specialised to recognise other fish’s bodies, and locate and remove remove parasites from them.
Haven’t read the original wrasse paper, but my understanding is that what made it a “pass” was that when presented with the mirror, the wrasses would clean their own bodies, rather than attempting to clean the wrasse in the mirror.
Overall, that finding pushed me in the direction this post argues; i.e. that it’s decent Bayes evidence in favor of self-awareness, but far from a slam-dunk.
If you’re aware of other preprints or publications taking TAI seriously, I would genuinely love to have citations! Makes it much easier to say these kinds of things to a policymaker when there’s a stack of supporting arguments.
Alice decides Principle X is important enough to make a big deal about.
People don’t seem to understand the issue. Alice explains it more. Some people maybe get it but then next week they seem to have forgotten. Other people still don’t get it.
This reminds me of a line from Shaw:
”The reasonable man adapts himself to the world: the unreasonable one persists in trying to adapt the world to himself. Therefore all progress depends on the unreasonable man.”
I think one way to navigate the challenge this post points at, is to recognize, both privately and publicly, that it’s hard to be perfect all the time, we all falter, and sometimes we must choose our battles.
I think it’s important both to enable the sort of self-forgiveness that’s most needed by those prone to self-flagellation, and also to lower the overall temperature.
So I strive (sometimes even successfully!) to recognize that the things of great import to another person may, in fact, be truly important… while also remembering that the things of great import to me may not, in fact, be truly important.
I think this aligns with your willingness to own “yes, I’m not applying that because it’s too much tradeoff for me”. Because yes, sometimes that person’s unreasonableness is correct… and sometimes I can’t stomach the battle for it, even though I know that. But acknowledging “well, I need a mulligan on this, and I’ll try to give one elsewhere” makes the whole world slightly better off.
(Quite confident) The most common illnesses (colds and flu) don’t build immunity in general (in kids or adults) because they mutate every year
Not my area, but it seems like the difference between “this year’s variant and last year’s” is going to be much, much smaller than the difference between “never exposed to any cold/flu before”.
So naively it seems possible that the first several colds do train the immune system quite a bit to handle colds in general, even if subsequent ones are then moving around to different points on the fitness landscape.
What economists get wrong (and sometimes right!) about AI
It absolutely happens in things like “measuring corruption of public officials in $COUNTRY”; such things are silently dropped, and have to be inferred by reading between the lines. Apparently this is a long and proud tradition, at least in philosophy:
https://www.thepsmiths.com/p/joint-review-philosophy-between-the
I’ve known a few, and that’s my impression as well, but I’m partly interested in the direction of causality. Is your impression that conditional on the songs being equally difficult, violinists would still be more neurotic?
If so, then I’m wondering if it has to do with the orchestral setting of playing in unison, where even a few cents difference is painfully obvious (at least to the other violinist)?
Or is it more the other way around, that violin calls to folks with a higher neurosis level?
So I did make a math mistake, but I think we’re in broad agreement. Let me be explicit for a race with total expected votes N=400 (e.g. seat on a city council for one district of a small town)
With N=400
sigma = sqrt(400 * 0.5 * 0.5) = 10
a 6-point lead means expected votes would be:
A : 212
B : 188
This corresponds to a win probability for A of cdf(12/10) ~88%
Changing one’s vote from A to B changes the expected counts to:
A : 211
B : 189
This corresponds to a win probability for A of cdf(11/10) ~86%
So yes, it’s only 2% change vs my earlier assertion of 8%, my mistake.
But I think we agree that sigma matters! And my point is that in small local elections, sigma is small, and your vote counts for a lot!
I agree that if you only care about federal policy, this doesn’t apply (I’d missed that in the initial post). But if you care about libraries, or how aggressive the police are, those are local issues where someone can have a strong influence in policy.
Let me elaborate: I broadly agree with the framing here, in that the probability of flipping a vote is going to be related to the margin of the race; in a race decided by a couple hundred votes, a single vote-flip counts for 0.5%; far more than it does in a national election.… if you’re voting in an election where one candidate has a 6% edge, your vote has roughly a 1 in 12 chance of changing the outcome! Thats massive leverage that you can’t hope to replicate in larger elections.
The value (which I believe maps to “goodness”) of that vote flip is going to be related to:
- the budget over which the politician has leverage
- what fraction of that budget spend affects you
- their probability of listening to what you have to say
While the budget is smaller in absolute terms, in terms of how it affects you it basically remains constant with election scale. i.e. the national budget is larger, but spread over 300M people, a local election has a smaller budget spread over a smaller population, but the per-person impact is about the same.
Moreover, precisely because local politicians know that every vote counts, they’re much more responsive to constituents than state or national politicians.
Given that A & B are much larger in local elections, I think there’s a lot of value there. The notable exception is if the policy is made at a higher level of jurisdiction.
Did the labs start getting much better about data cleanup around this time? I know the “Textbooks are all you need” paper was in mid-2023, depending on training cycle etc I can also imagine that cleaner input improved agentic-specific skills. e.g., they started focusing on using TDD to make sure that the tests passed; this ties into the RLVR point, obviously.
dan.parshall’s Shortform
Do fiddle players suffer from this as well? Or is the repertoire just so much easier?
I think this is a great point:
(This last comes down to a property of high-dimensional geometry. Imagine that the “correct” specification of morality is 100 bits long, and that for every bit, any individual human has a probability of 0.1 of being a “moral mutant” along that dimension. The average human only has 90 bits “correct”, but everyone’s mutations are idiosyncratic: someone with their 3rd, 26th, and 78th bits flipped doesn’t see eye-to-eye with someone with their 19th, 71st, and 84th bits flipped, even if they both depart from the consensus. Very few humans have all the bits “correct”—the probability of that is —but Claude does, because everyone’s “errors” cancel out of the pretraining prior.)
I actually wrote a proposal specifically about how we could elicit exactly this information. Briefly, instead of using a pair ‘proposed responses’, and then choosing between the two of them (which as a side effect probably encourages hallucination), instead you could take a single proposed response, and then show it to two reviewers (whether human or their designated agent). If you get two thumbs-up, use positive reinforcement, two thumbs down use negative reinforcement (which helps punish truly horrible proposals) and mixed signal could go to a reconcilliation round, to “navigate” between the two perspectives.
The key is that if this is framed as an ongoing process, then one can make “navigate differing values” the anchor of identity, and then corrigibility isn’t “resistance to my values”,.. reconciling is the core AI value… (fingers crossed)
I think combined with a shift in how we imagine corrigibility, we might buy ourselves several more years. Happy to discuss further if you’re interested.
I’m of the opinion that “automate half the jobs in the economy” is certainly on the table:
https://canaryinstitute.ai/research/task-exposure/This is part of what I’m using to talk to policymakers about why they need to act NOW
How much is that talked about by the cryonics companies? Normally “social proof” is a big deal, and having that as part of a FAQ would be very persuasive to the normies!