TsviBT

Karma: 8,265

TsviBT 23 Oct 2025 12:18 UTC
5 points
−1
in reply to: Davidmanheim’s comment on: Which side of the AI safety community are you in?
What? Surely “it’s fake” is a fine way to say “most people who would say they are in C are not actually working that way and are deceptively presenting as C”? It’s fake.

TsviBT 23 Oct 2025 5:27 UTC
13 points
6
in reply to: Michaël Trazzi’s comment on: Which side of the AI safety community are you in?
C is fake, it’s part of A, and A is fake, it’s washing something which we don’t yet understand and should not pretend to understand.

TsviBT 19 Oct 2025 22:04 UTC
2 points
0
in reply to: leogao’s comment on: leogao’s Shortform
https://sites.santafe.edu/~wbarthur/thenatureoftechnology.htm

TsviBT 15 Oct 2025 3:32 UTC
7 points
0
in reply to: LWLW’s comment on: LWLW’s Shortform
(For reference, 135 is 2.33 SDs, which works out to about 1 in 100, i.e. you’re the WAISest person in the room with 100 randomly chosen adults. Cf. https://tsvibt.blogspot.com/2022/08/the-power-of-selection.html#samples-to-standard-deviations )

TsviBT 14 Oct 2025 20:27 UTC
7 points
4
in reply to: kave’s comment on: The “Length” of “Horizons”

AI can do longer and longer coding tasks.

But this is not a good category; it contains both [the type of long coding task that involves having to creatively figure out several points] and also other long coding tasks. So the category does not support the inference. It makes it easier for AI builders to run… some funny subset of “long coding tasks”.

TsviBT 11 Oct 2025 5:57 UTC
4 points
0
on: Training Regime Day 3: Tips and Tricks
The “Boggle” section of this could almost be a transcript from the CFAR class. Should be acknowledged.

TsviBT 10 Oct 2025 5:21 UTC
17 points
2
in reply to: ryan_greenblatt’s comment on: Tomás B.’s Shortform
I don’t necessarily disagree with what you literally wrote. But also, at a more pre-theoretic level, IMO the sequence of events here should be really disturbing (if you haven’t already been disturbed by other similar sequences of events). And I don’t know what to do with that disturbedness, but “just feel disturbed and don’t do anything” also doesn’t seem right. (Not that you said that.)

TsviBT 9 Oct 2025 4:25 UTC
2 points
0
in reply to: Mateusz Bagiński’s comment on: TsviBT’s Shortform
(I think I’ll go with “alloprosphanize” for now… not catchy but ok. https://tsvibt.github.io/theory/pages/bl_25_10_08_12_27_48_493434.html )

TsviBT 8 Oct 2025 17:03 UTC
2 points
0
in reply to: Mateusz Bagiński’s comment on: TsviBT’s Shortform

Exoclarification? Alloclarification? Democlarification (dēmos—“people”)?

These are good ideas, thanks!

TsviBT 8 Oct 2025 16:49 UTC
4 points
0
on: TsviBT’s Shortform
Ostentiation:

So there’s steelmanning, where you construct a view that isn’t your interlocutor’s but is, according to you, more true / coherent / believable than your interlocutor’s. Then there’s the Ideological Turing Test, where you restate your interlocutor’s view in such a way that ze fully endorses your restatement.

Another dimension is how clear things are to the audience. A further criterion for restating your interlocutor’s view is the extent to which your restatement makes it feasible / easy for your audience to (accurately) judge that view. You could pass an ITT without especially hitting this bar. Your interlocutor’s view may have an upstream crux that ze doesn’t especially feel has to be brought out (so, not necessary for the ITT), but which is the most cruxy element for most of the audience. You can pass the ITT while emphasizing that crux or while not emphasizing that crux; from your interlocutor’s point of view, the crux is not necessarily that central, but is agreeable if stated.

A proposed term for this bar of exposition / restatement: ostentiate / ostentiation. Other terms:
- ekthesize / exthesize; phosphanize, epiphotize; elogize; anaply / anaplain; anaptychize / exptychize / eptychize;
- adluminate, superluminate; ostentiate, superostentiate;
- belight, beshine, enbeacon.

TsviBT 7 Oct 2025 21:02 UTC
19 points
0
in reply to: johnswentworth’s comment on: johnswentworth’s Shortform
Cf. https://web.archive.org/web/20120331071849/http://www.paul-almond.com/WhatIsALowLevelLanguage.htm
What links here?
- Thane Ruthenis's comment on johnswentworth’s Shortform by johnswentworth (8 Oct 2025 14:30 UTC; 7 points)

TsviBT 7 Oct 2025 21:00 UTC
23 points
33
in reply to: Jan_Kulveit’s comment on: Jan_Kulveit’s Shortform

is a fine example of thinking you get when smart people do evil things and their minds come up with smart justifications why they are the heroes

I want to just agree because fuck those guys, but actually I think it’s also shit justifications. A good justification might come from exploring the various cases where we have decided to not make something, analyzing them, and then finding that there’s some set of reasons that those cases don’t transfer as possibilities for AI stuff.

TsviBT 6 Oct 2025 21:10 UTC
8 points
2
in reply to: habryka’s comment on: Cole Wyeth’s Shortform
This seems fine and good—for laying some foundations, which you can use for your own further theorizing, which will make you ready to learn from more reliable + rich expert sources over time. Then you can report that stuff. If instead you’re directly reporting your immediately-post-LLM models, I currently don’t think I want to read that stuff, or would want a warning. (I’m not necessarily pushing for some big policy, that seems hard. I would push for personal standards though.)

TsviBT 6 Oct 2025 21:07 UTC
16 points
9
in reply to: Duncan Sabien (Inactive)’s comment on: CFAR update, and New CFAR workshops
(I don’t know to what extent I’m agreeing with Duncan’s statements here. I separately wrote comments about something like this new CFAR, though I’m not sure how much overlap there is between this vs. what I was commenting about. One thing my comments brought up was that I felt earlier CFAR things were, besides being often very helpful to many people including me, also not infrequently damaging to epistemics for group-related reasons, which seems possibly related to the “status differential” thing Duncan alludes to downthread. My comments touched on how CFAR in general and maybe Anna in particular did not nearly sufficiently treat incoming people as authors of the group narrative, and that this creates lots of distortionary effects (see for example https://www.lesswrong.com/posts/ksBcnbfepc4HopHau/dangers-of-deference ). And, similar to what Duncan says, this felt to me like a hydra / whack-a-mole type problem, where multiple patches didn’t address the underlying thing. Though, with that type of problem, it tends to be hard to accurately characterize (and judge and fix-or-endorse) the underlying thing.)

TsviBT 6 Oct 2025 16:46 UTC
5 points
0
in reply to: habryka’s comment on: Cole Wyeth’s Shortform
There’s no such thing as “a domain where LLMs are particularly likely to hallucinate”. In every domain there’s some obscure jagged boundary, not very far from normal standard questions to ask, where LLMs will hallucinate, usually plausibly to a non-expert.

TsviBT 6 Oct 2025 16:39 UTC
12 points
6
in reply to: habryka’s comment on: Cole Wyeth’s Shortform

I am treating LLM output as somewhat less trustworthy than I would trust what a colleague of mine says, but not fundamentally different.

If you’re asking a human about some even mildly specialized topic, like history of Spain in the 17th century or different crop rotation methods or ordinary differential equations, and there’s no special reason that they really want to appear like they know what they’re talking about, they’ll generally just say “IDK”. LLMs are much less like that IME. I think this is actually a big difference in practice, at least in the domains I’ve tried (reproductive biology). LLMs routinely give misleading / false / out-of-date / vague-but-deceptively-satiating summaries.

TsviBT 4 Oct 2025 2:29 UTC
22 points
18
in reply to: wdmacaskill’s comment on: A Reply to MacAskill on “If Anyone Builds It, Everyone Dies”

In general, I think that how the IE happens and is governed is a much bigger deal than when it happens.

(I don’t have much hope in trying to actually litigate any of this, but:)

Bro. It’s not governed, and if it happens any time soon it won’t be aligned. That’s the whole point.

The right response is an “everything and the kitchen sink” approach — there are loads of things we can do that all help a bit in expectation (both technical and governance, including mechanisms to slow the intelligence explosion), many of which are easy wins, and right now we should be pushing on most of them.

How do these small kitchen sinks add up to pushing back AGI by, say, several decades? Or add up to making an AGI that doesn’t kill everyone? My super-gloss of the convo is:

IABIED: We’re plummeting toward AGI at an unknown rate and distance; we should stop that; to stop that we’d have to do this really big hard thing; so we should do that.

You: Instead, we should do smaller things. And you’re distracting people from doing smaller things.

Is that right? Why isn’t “propose to the public a plan that would actually work” one of your small things?

TsviBT 1 Oct 2025 5:23 UTC
6 points
2
in reply to: Raemon’s comment on: Raemon’s Shortform Feed
(I will abstractly state that I feel negatively towards the group dynamics around some AI debates in the broader EA/LW/AI/X-derisking sphere, e.g. about timelines; so, affirming that I feel “knee-deep” in something, or I would if my primary activity were about that; and affirming that addressing this in a gradual-unraveling way could be helpful.)

TsviBT 29 Sep 2025 23:43 UTC
7 points
4
in reply to: Garrett Baker’s comment on: I have decided to stop lying to Americans about 9/11
I don’t know if any/many Jews/Israelis actually celebrated, and if they did then why would that be; but the conspiratorial accusation towards Jews/Israelis in particular is that they caused or at least celebrated the attack because it would drag the US into fighting Israel’s enemies in the ME.

Separately, I could imagine [anyone who’s generally been a continual target of Islamic extremism] not exactly celebrating but (unwisely) seeing the upside “well at least now you get it”.

TsviBT 29 Sep 2025 5:30 UTC
4 points
0
in reply to: AnnaSalamon’s comment on: Adele Lopez’s Shortform
Hm. I thought I saw somewhere else in this comment thread that mentions this, but now I can’t find it, so I’ll put this here.

Sometimes mind is like oobleck ( https://www.lesswrong.com/posts/7RFC74otGcZifXpec/the-possible-shared-craft-of-deliberate-lexicogenesis?commentId=BHkcKpdmX5qzoZ76q ).

In other words, you push on it, and you feel something solid. And you’re like “ah, there is a thingy there”. But sometimes what actually happened is that by pushing on it, you made it solid. (...Ah I was probably thinking of plex’s comment.)

This is also related to perception and predictive processing. You can go looking for something X in yourself, and everything you encounter in yourself you’re like ”… so, you’re X, right?”; and this expectation is also sort of a command. (Or there could be other things with a similar coarse phenomenology to that story. For example: I expect there’s X in me; so I do Y, which is appropriate to do if X is in me; now I’m doing Y, which would synergize with X; so now X is incentivized; so now I’ve made it more likely that my brain will start doing X as a suitable solution.) (Cf. “Are you triggered yet??” https://x.com/tsvibt/status/1953650163962241079 )

If you have too much of an attitude of “just looking is always fine / good”, you might not distinguish between actually just looking (insofar as that’s coherent) vs. going in and randomly reprogramming yourself.