Joey KL

Karma: 318

Joey KL 20 Apr 2026 20:30 UTC
11 points
0
on: If a room feels off the lighting is probably too “spiky” or too blue
I’m somewhat skeptical of the importance of ultra high CRI. I don’t doubt that 80 vs 95 makes a difference, but I’m not sold on 95 vs 98, since it seems susceptible to placebo and is confounded in the wild by other lighting design factors. What convinced you?

Joey KL 25 Mar 2026 0:56 UTC
6 points
3
in reply to: J Bostock’s comment on: AI 2027 versus World War 2027
China could definitely win a war with Taiwan now, but it might not go smoothly or cheaply. I imagine they want plans/preparations that meet a high standard for decisive victory. But if Taiwan formally declared independence, I think they would start a war immediately.

Joey KL 25 Mar 2026 0:21 UTC
1 point
0
on: My cost-effectiveness unit
This is great! Might I suggest basis points for the numerator unit, since the quantities are so small. Then the unit can be “bips per billion“, with your reference quantity at 20bps/$B.

Joey KL 22 Mar 2026 19:09 UTC
2 points
−1
in reply to: Guive’s comment on: China Derangement Syndrome
What’s an example of one account by which China currently thinks the rest of the world is low in quality?

Joey KL 20 Mar 2026 21:51 UTC
2 points
0
in reply to: Joey KL’s comment on: peterbarnett’s Shortform
What’s more, it may be that the competitive equilibrium of increasingly advanced AI never leads to a decisive strategic advantage over other countries trying to keep up. For example, countries with increasingly automated economies may still want to trade with less automated countries, which enriches those countries enough to stay somewhat relevant by automating at a delay. Or less automated economies might have a “catch up” advantage where they can copy the advancements of the most automated economies more easily than they could have developed them themselves. This is how I would describe the geopolitical effects of the industrial revolution: it was a major advantage for the countries first to industrialize, but not an overwhelming / decisive one.

Joey KL 20 Mar 2026 21:39 UTC
5 points
−2
in reply to: peterbarnett’s comment on: peterbarnett’s Shortform
This doesn’t seem right to me. Middle powers will only become irrelevant insofar as the ASI built by great powers isn’t aligned to the interests of middle powers. But it could be that it will be: it could be aligned to respect property rights, or human rights, sovereign rights, or values which the middle powers hold in common with the great powers; or it could be aligned to serve citizens / politicians of the great powers who are sympathetic to the middle power. And to the extent that the ASI won’t be by default, the middle powers can use their leverage to try to get it aligned with those things rather than to slow down / stop it entirely.

Joey KL 17 Feb 2026 2:24 UTC
2 points
0
in reply to: cubefox’s comment on: Jailbreaking is Empirical Evidence for Inner Misalignment and Against Alignment by Default
Have you read Ender’s Game?
It’s about a child prodigy being tricked into destroying an alien planet by having it presented to him as just a simulation.

Joey KL 24 Jan 2026 19:55 UTC
1 point
0
in reply to: Joey KL’s comment on: Habryka’s Shortform Feed
Oh, I didn’t see your footnote. I didn’t know about the US Marshalls vs Marshall of the US Supreme Court distinction. That’s interesting and confusing!

Joey KL 23 Jan 2026 7:41 UTC
7 points
0
in reply to: habryka’s comment on: Habryka’s Shortform Feed
The US Marshalls are charged with carrying out court orders but are actually part of the executive branch, which makes it even less plausible they would materially stand up to the president.

Joey KL 13 Jan 2026 3:29 UTC
38 points
39
in reply to: Richard_Ngo’s comment on: 1a3orn’s Shortform
Huh, that seems totally wrong to me. This seems like about as straightforwardly a case of incorrigibility as I can imagine.

Joey KL 22 Dec 2025 9:15 UTC
1 point
4
in reply to: Thomas Kwa’s comment on: Thomas Kwa’s Shortform
Yeah, I think would have just said “someone who works at an AI capabilities company”. (I’m not trying to make this a big deal or controversy, I just find myself very irritated by how they get attention by antagonizing AI safety people in ways that seem bad faith to me.)

Joey KL 22 Dec 2025 4:09 UTC
3 points
0
in reply to: Thomas Kwa’s comment on: Thomas Kwa’s Shortform
Sure, I don’t think there’s much problem with engaging with the point on substance, but I would avoid publicizing their brand.

Joey KL 22 Dec 2025 3:21 UTC
9 points
−9
in reply to: Thomas Kwa’s comment on: Thomas Kwa’s Shortform
I think there should be some kind of discourse sanction on Mechanize (as in, broadly avoid engaging with their ideas and mentioning them) because I think they are intentionally operating in bad faith as a brand as a publicity stunt.

Joey KL 6 Dec 2025 4:16 UTC
51 points
25
in reply to: Cole Wyeth’s comment on: Cole Wyeth’s Shortform
Finally, we’ve optimized the Long Horizon Software Development capability, from the famous METR graph “Don’t Optimize The Long Horizon Software Development Capability”

Joey KL 23 Nov 2025 21:44 UTC
1 point
0
in reply to: habryka’s comment on: anaguma’s Shortform
What I’m imagining is: we train AIs on a mix of environments that admit different levels of reward hacking. When training, we always instruct our AI to do, as best as we understand it, whatever will be reinforced. For capabilities, this beats never using hackable environments, because it’s really expensive to use very robust environments; for alignment, it beats telling it not to hack, because that reinforces disobeying instructions.
In the limit, this runs into problems where we have very limited information about what reward hacking opportunities are present in the training environments, so the only instruction we can be confident is consistent with the grader is “do whatever will receive a high score from the grader”, which will… underspecify… deployment behavior, to put it mildly.
But, in the middle regime of partial information about how reward-hackable our environments are, I think “give instructions that match the reward structure as well as possible” is a good, principled alignment tactic.
Basically, I think this tactic is a good way to more safely make use of hackable environments to advance the capabilities of models.

Joey KL 23 Nov 2025 19:24 UTC
1 point
0
in reply to: habryka’s comment on: anaguma’s Shortform
Hmm, I think I disagree with “If you can still tell that an environment is being reward hacked, it’s not the dangerous kind of reward hacking.” I think there will be a continuous spectrum of increasingly difficult to judge cases, and a continuous problem of getting better at filtering out bad cases, such that “if you can tell” isn’t a coherent threshold. I’d rather talk about “getting better at distinguishing” reward hacking.
I think we just have different implicit baselines here. I’m judging the technique as: “if you are going to train AI on an imperfect reward signal, do you want to instruct them to do what you want, or to maximize the reward signal?” and I think you clearly want the later for simple, elegant reasons. I agree it’s still a really bad situation to be training on increasingly shoddy reward signals at scale, and that it’s very important to mitigate this, and this isn’t at all a sufficient mitigation. I just think it’s a principled mitigation.

Joey KL 23 Nov 2025 8:32 UTC
1 point
0
in reply to: leogao’s comment on: leogao’s Shortform
update: I flew Delta today and the wifi wasn’t very good. I think I misremembered it being better than it is.

Joey KL 23 Nov 2025 0:18 UTC
13 points
−2
in reply to: anaguma’s comment on: anaguma’s Shortform
I disagree entirely. I don’t think it’s janky or ad-hoc at all. That’s not to say I think it’s a robust alignment strategy, I just think it’s entirely elegant and sensible.
The principle behind it seems to be: if you’re trying to train an instruction following model, make sure the instructions you give it in training match what you train it to do. What is janky or ad hoc about that?

Joey KL 22 Nov 2025 5:38 UTC
1 point
0
in reply to: leogao’s comment on: leogao’s Shortform
Huh, I have pretty good experiences using wifi flying Delta?

Joey KL 15 Nov 2025 0:12 UTC
4 points
2
in reply to: Vladimir_Nesov’s comment on: Vladimir_Nesov’s Shortform
I disagree with “don’t believe what you can’t explain”. I think being successful where others have failed often requires executing on intuitions that you can’t easily justify. I think this should be encouraged, as long as you’re adequately internalizing the risk of failure. (In the sense of economic internalize, not psychological internalize.)