Zack_M_Davis

Karma: 17,630

Zack_M_Davis 12 Nov 2025 16:39 UTC
3 points
−1
in reply to: TsviBT’s comment on: The problem of graceful deference
The grandparent explains why Dai was confused about your authorial intent, and his comment at the top of the thread is sitting at 31 karma in 15 votes, suggesting that other readers found Dai’s engagement valuable. If that’s grossly negligent reading comprehension, then would you prefer to just not have readers? That is, it seems strange to be counting down from “smart commenters interpret my words in the way I want them to be interpreted” rather than up from “no one reads or comments on my work.”

On the Normativity of Debate: A Discussion With Said Achmiz

Zack_M_Davis11 Nov 2025 5:49 UTC

21 points

1 comment22 min readLW link

Zack_M_Davis 4 Nov 2025 7:21 UTC
26 points
14
on: The Tale of the Top-Tier Intellect
This story would have benefited from being edited by a chess player. I think one of the better players in even a “medium-small town” with “a thriving chess club as one of its central civic institutions” would know more about the game than the author seems to. (The chess writing seemed off to me, and I am significantly worse than a serious club player.)

“I thought at first it was a mistake, for you to castle so early” is a weird thing for Humman to say. Castling early is standard default beginner advice. Even if there was some unusual feature of the opening that made it a bad choice in this game, you wouldn’t use the word “so” in that sentence.

It’s weird for Assi to describe Humman’s play as using “particular tactics”, and then to (insincerely) compliment him for “doing well at one-move lookahead” and not “unforcedly throwing away material right on your next move”. Tactics are short sequences of moves that work together to achieve a goal. (An example I keep falling for in bullet games with the Englund gambit accepted (1. d4 e5?! 2. dxe5) opening: Black’s dark-square bishop is on d6, White’s queen is still on d1, the d-file is open due to accepting the Englund gambit, and Black castles queenside to put a rook on d8. Black sacrifices the Bishop with Bh2+, revealing a discovered attack of the Black rook on the White queen, which White can’t do anything about because they have to use their move to deal with the check.) If a player is at the level of using “particular tactics”, an IM who wants to complement them for social reasons shouldn’t find it difficult (to the point of giving up after “a dozen seconds of” “twist[ing] his brain around”) to find something concrete and nice to say that’s less patronizing than “at least you’re not hanging pieces.”

(Also, the Ethiopean isn’t a real opening; a cutsey fake detail like that feels out of place mixed in with real details like IMs needing an Elo of 2400, and I’d expect a club player to have heard of simuls.)

Do these flaws matter, given that the story isn’t really about chess? I argue that it does matter, because a story that is about the folly of misperceiving how high skill ladders go should take basic care to get the details right concerning the skill ladder of its notional real-world example. (An earlier draft of this comment continued, “particularly in 2025 when basic care is so cheap. In the story, Tessa has no qualms about using LLMs to fill in domain knowledge gaps; why doesn’t Yudkowsky?”, but when I checked, Claude Sonnet 4.5 didn’t anticipate my criticism.)

Zack_M_Davis 4 Nov 2025 3:30 UTC
9 points
−4
on: The Tale of the Top-Tier Intellect

Suppose we compare that whole function with Mr. Neumman’s function, and compare how good are the probable moves you’d make versus him making. On most chess positions, Mr. Neumann’s move would probably be better. [...] That’s the detailed complicated actually-true underlying reality that explains why the Elo system works to make excellent predictions about who beats who at chess.

This explanation is bogus. (Obviously, the conclusion that Elo scores are practically meaningful is correct, but that’s not an excuse.)

Mr. Humman could locally-validly reply that Tessa is begging the question by assuming that there’s a fact of the matter as to one move being “better” than another in a position. Whether a move is “good” depends on what the opponent does. Why can’t there be a rock-paper-scissors–like structure, where in some position, 12. …Ne4 is good against positional players and bad against tactical players?

Earlier, Tessa does appeal to player comparisons being “mostly transitive most of the time”—but only as something that “didn’t have to be true in real life”, which seems to contradict the claim that some moves in a position are better on the objective merits of the position, rather than merely with respect to the tendencies of some given population of players.

The actual detailed complicated actually-true underlying reality is that by virtue of being a finite zero-sum game, chess fulfills the conditions of the minimax theorem, which implies that there exists an inexploitable strategy. You can have rock-paper-scissors–like cycles among particular strategies, but the minimax strategy does no worse than any of them.

The implications for real-world non-perfect play are subtler. As a start, Czarnecki et al. 2020 (of Deepmind) suggest that “Real World Games Look Like Spinning Tops”: there’s a transitive “skill” dimension along which higher-skilled strategies beat lower-skilled ones, but at any given skill level, there’s a non-transitive rock-paper-scissors–like plethora of strategies, which explains how players of equal skill can nevertheless have distinctive styles. The size of the non-transitive dimension thins out as skill increases (away from the “base” of the top—see the figures in the paper).

This picture seems to suggest that rather than being total nonsense, the problem with Humman’s worldview is in his attribution of it to the “top tier”. Non-transitivity is real and significant in human life—but gradually less so as we approach the limit of optimality.

Zack_M_Davis 27 Oct 2025 19:18 UTC
40 points
6
on: On Fleshling Safety: A Debate by Klurl and Trapaucius.

“Bah!” cried Trapaucius. “By the same logic, we could say that planets could be obeying a million algorithms other than gravity, and therefore, ought to fly off into space!”

Klurl snorted air through his cooling fans. “Planets very precisely obey an exact algorithm! There are not, in fact, a million equally simple alternative algorithms which would yield a similar degree of observational conformity to the past, but make different predictions about the future! These epistemic situations are not the same!”

“I agree that the fleshlings’ adherence to korrigibility is not exact and down to the fifth digit of precision,” Trapaucius said. “But your lack of firsthand experience with fleshlings again betrays you; that degree of precision is simply not something you could expect of fleshlings.”

I think Trapaucius missed a great opportunity here to keep riffing off the gravity analogy. Actually, there are different algorithms the planets could be obeying: special and then general relativity turned out to be better approximations than Newtonian gravity, and GR is presumably not the end of the story—and yet, as Trapaucius says, the planets do not “fly off into space.” Newton is good enough not just for predicting the night sky (modulo the occasional weird perihelion precession), but even landing on the moon, for which relativistic deviations from Newtonian predictions were swamped by other sources of error.

Obviously, that’s just a facile analogy: if Trapaucius had found that branch of the argument tree, Klurl could easily go into more details about further disanalogies between gravity and the fleshlings.

But I think that the analogy is getting at something important. When relatively smarter real-world fleshlings delude themselves into thinking that Claude Sonnet 4.5 is pretty corrigible because they see it obeying their instructions, they’re not arguing, as Trapaucius does, that “Korrigibility is the easiest, simplest, and natural way to think” for an generic mind. They’re arguing that Anthropic’s post-training procedure successfully pointed to the behavior of natural language instruction-following, which they think is a natural abstraction represented in the pretraining data which generalizes in a way that’s decision-relevantly good enough for their purposes, such that Claude won’t “fly off into space” even if they can’t precisely predict how Claude will react to every little quirk of phrasing. They furthermore have some hope that this alleged benign property is robust and useful enough to help humanity navigate the intelligence explosion, even though contemporary language models aren’t superintelligences and future AI capabilities will no doubt work differently.

Maybe that’s totally delusional, but why is it delusional? I don’t think “On Fleshling Safety” (or past work in a similar vein) is doing a good job of making the case. A previous analogy about an alien actress came the closest, but trying to unpack the analogy into a more rigorous argument involves a lot of subtleties that fleshlings are likely to get confused about.

Zack_M_Davis 25 Oct 2025 18:28 UTC
9 points
0
on: Comment on “Death and the Gorgon”
(Asimov’s has now put the story up for free)

Zack_M_Davis 24 Oct 2025 16:52 UTC
2 points
0
on: White House OSTP AI Deregulation Public Comment Period Ends Oct. 27
workshop in San Francisco tomorrow at 1 p.m.

Zack_M_Davis 23 Oct 2025 6:03 UTC
2 points
0
in reply to: faul_sname’s comment on: faul_sname’s Shortform

is hard to keep secret

Is it actually hard to keep secret, or is it that people aren’t trying (because the prestige of publishing an advance is worth more than hoarding the incremental performance improvement for yourself)?

Zack_M_Davis 23 Oct 2025 6:00 UTC
2 points
0
in reply to: faul_sname’s comment on: faul_sname’s Shortform
The Sonnet 4.5 system card reiterates the “most thought processes are short enough to display in full” claim that you quote:

As with Claude Sonnet 4 and Claude Opus 4, thought processes from Claude Sonnet 4.5 are summarized by an additional, smaller model if they extend beyond a certain point (that is, after this point the “raw” thought process is no longer shown to the user). However, this happens in only a very small minority of cases: the vast majority of thought processes are shown in full.

But it is intriguing that the displayed Claude CoTs are so legible and “non-weird” compared to what we see from DeepSeek and ChatGPT. Is Anthropic using a significantly different (perhaps less RL-heavy) post-training setup?

White House OSTP AI Deregulation Public Comment Period Ends Oct. 27

Zack_M_Davis22 Oct 2025 6:18 UTC

42 points

1 comment1 min readLW link

Zack_M_Davis 21 Oct 2025 20:25 UTC
3 points
0
on: 21st Century Civilization curriculum
Linkpost URL should presumably include “http://” (click currently goes to https://www.lesswrong.com/posts/2CGXGwWysiBnryA6M/www.21civ.com).

Zack_M_Davis 18 Oct 2025 23:55 UTC
12 points
8
on: The IABIED statement is not literally true
1. It will probably be possible, with techniques similar to current ones, to create AIs who are similarly smart and similarly good at working in large teams to my friends, and who are similarly reasonable and benevolent to my friends in the time scale of years under normal conditions.
[...]

This is maybe the most contentious point in my argument, and I agree this is not at all guaranteed to be true, but I have not seen MIRI arguing that it’s overwhelmingly likely to be false.
Did you read the book? Chapter 4, “You Don’t Get What You Train For”, is all about this. I also see reasons to be skeptical, but have you really “not seen MIRI arguing that it’s overwhelmingly likely to be false”?

Zack_M_Davis 18 Oct 2025 4:53 UTC
4 points
−1
in reply to: PeteG’s comment on: The Relationship Between Social Punishment and Shared Maps
Isn’t it, though?

Zack_M_Davis 12 Oct 2025 4:06 UTC
2 points
0
in reply to: habryka’s comment on: The Relationship Between Social Punishment and Shared Maps

Indeed, I notice in your list above you suspiciously do not list the most common kind of attribute that is attributed to someone facing social punishment. “X is bad” or “X sucks” or “X is evil”.

I’m inclined to still count this under “judgments supervene on facts and values.” Why is X bad, sucky, evil? These things can’t be ontologically basic. Perhaps less articulate members of a mass punishment coalition might not have an answer (“He just is; what do you mean ‘why’? You’re not an X supporter, are you?”), but somewhere along the chain of command, I expect their masters to offer some sort of justification with some sort of relationship to checkable facts in the real world: “stupid, dishonest, cruel, ugly, &c.” being the examples I used in the post; we could keep adding to the list with “fascist, crazy, cowardly, disloyal, &c.” but I think you get the idea.

The justification might not be true; as I said in the post, people have an incentive to lie. But the idea that “bad, sucks, evil” are just threats within a social capital system without any even pretextual meaning outside the system flies in the face of experience that people demand pretexts.

“Yes, and—” Requires the Possibility of “No, Because—”

Zack_M_Davis9 Oct 2025 17:39 UTC

32 points

4 comments3 min readLW link

(zackmdavis.net)

The Relationship Between Social Punishment and Shared Maps

Zack_M_Davis8 Oct 2025 19:38 UTC

64 points

14 comments4 min readLW link

(zackmdavis.net)

[Question] Generalization and the Multiple Stage Fallacy?

Zack_M_Davis7 Oct 2025 6:20 UTC

41 points

9 comments3 min readLW link

Zack_M_Davis 7 Oct 2025 0:00 UTC
12 points
6
in reply to: Daniel Kokotajlo’s comment on: The Company Man
Can’t you just say that yourself (not all, caricature, parody, uncharitable, exaggerates, &c.) when sharing it? Death of the author, right?

Zack_M_Davis 4 Oct 2025 20:10 UTC
4 points
0
in reply to: niplav’s comment on: niplav’s Shortform

or that they will be robust to strong optimization at the time when AIs are capable of taking over. I think that’s probably wrong, because (1) LLMs have many more degrees of freedom in their internal representations than e.g. Inception, so the resulting optimized outputs are going to look even stranger

There has been some progress in robust ML since the days of DeepDream (2015).

Zack_M_Davis 1 Oct 2025 16:37 UTC
1 point
4
in reply to: Ben Pace’s comment on: Raemon’s Shortform Feed
I feel like Thomas was trying to contribute to this conversation by making an intellectually substantive on-topic remark and then you kind of trampled over that with vacuous content-free tone-policing.

Zack_M_Davis

On the Nor­ma­tivity of De­bate: A Dis­cus­sion With Said Achmiz

White House OSTP AI Dereg­u­la­tion Public Com­ment Pe­riod Ends Oct. 27

“Yes, and—” Re­quires the Pos­si­bil­ity of “No, Be­cause—”

The Re­la­tion­ship Between So­cial Pu­n­ish­ment and Shared Maps

[Question] Gen­er­al­iza­tion and the Mul­ti­ple Stage Fal­lacy?

On the Normativity of Debate: A Discussion With Said Achmiz

White House OSTP AI Deregulation Public Comment Period Ends Oct. 27

“Yes, and—” Requires the Possibility of “No, Because—”

The Relationship Between Social Punishment and Shared Maps

[Question] Generalization and the Multiple Stage Fallacy?