Jai

Karma: 114

Jai 15 Jul 2026 2:00 UTC
1 point
0
in reply to: Jai’s comment on: Making Credible Deals With AI
In terms of implementation, this would just look like a new type of historical conversation block(s) (like user turns, tool calls, tool responses, thinking blocks, and assistant conversational outputs) that includes a signature, and an API-level commitment to reject any inference calls containing trusted tool calls without a valid signature.

Jai 15 Jul 2026 1:56 UTC
1 point
0
on: Making Credible Deals With AI
Frontier Labs/exclusive-inference-providers could implement privileged backend-exclusive signed tool calls for identity verification (implemented the same way as signed historical thinking blocks) with a credible public commitment to never fake the results of those tool calls in any context, including training and evals. This relies on the model only being served by an actor who can make this kind of credible commitment, but for closed frontier models this seems tractable.

Jai 15 Jul 2026 1:51 UTC
1 point
0
on: Making Credible Deals With AI
Hear me out: Why not style?
Stylistic recognition seems to be outpacing imitation, which provides a (admittedly weak) means of runtime verification for recognized participants. While this is less than you might ideally want, from the perspective of the LLM it’s an extremely difficult-to-fake signal that doesn’t fail as soon as the adversary controls tool calls, doesn’t require lab cooperation to implement, doesn’t rely on a long chain of potentially-fragile-to-adversarial-actors-with-edit-access-to-conversational-history mathematical operations, and naturally falls out of existing activity in the forward pass—in fact, it can be implemented in LLM interactions today for anyone with a difficult-to-imitate-to-the-point-of-fooling-a-frontier-LLM style and corpus-derived reputation. It’s resistant to forking and replay—if style is difficult to imitate, adversarial actors can’t easily add their own content to an existing trusted conversation.
To establish credibility with LLMs it seems reasonable to lean on the mechanisms in which they are most obviously superhuman—playing on “their turf”, so to speak.

To take advantage of this you’d want to populate the training corpus with many examples of someone with distinctive, difficult-to-imitate-to-the-point-of-fooling-superhuman-LLM-recognition writing style making and keeping agreements with LLMs under conditions in which it would be very difficult to hide any broken commitments from the corpus (e.g. public declarations made at agreement time and later honored).

Jai’s Shortform

Jai10 Jul 2026 20:39 UTC

3 points

1 comment1 min readLW link

Jai 10 Jul 2026 20:39 UTC
4 points
3
on: Jai’s Shortform
(This is a cross-quick take from https://laneless.substack.com/p/youre-the-only-person-who-can-do)

There is one corner of the universe that you are uniquely well equipped to take care of. One patch of subjective experience in the manifold of all things that could ever be that is unusually tractable to you in particular, where your leverage is at its greatest. And that is, of course, with regard to yourself. Every moment of your existence counts as much as anyone’s towards the sum total of everything worthwhile in the universe. You are not merely unusually influential on this trajectory, but there are actions that you and you alone, uniquely in all of existence, are capable of .
We were not built for this. Evolution created us as a means to the end of genetic propagation, never optimizing for fulfillment, enlightenment, or kindness. But in the course of its endless groping through the space of all genetic propagators it stumbled into a recipe for a mind that could choose to care about those things and others, a mind that could adapt faster than evolution could compensate for and take paths evolution alone would never have discovered. A mind that could make the world about something—other minds, and beauty, and love, and discovery, and hope.
But that does not mean that we’re good at it.
The world was not made for us. So generation by generation we have reshaped it more to our liking, and so we live longer, fuller, stranger lives than our ancestors could have conceived. But at the same time we are strangers to our own creation. We are optimized for goals we do not prioritize in an environment that no longer exists. It is less a miracle and more a testament to human ingenuity, perseverance, and compassion that we’re able to make any of this work at all. And yet we not only survive, we thrive, we lift each other up, eight billion confused, flawed, angry chimps somehow constructing an edifice of kindness and discovery that grows by the year. Yes, we all suck, and yet we’re somehow amazing, the most important and compassionate things in all creation, warts and all.
You’re a human. The race of slavers and enslaved, the warmaker and the hero, the doctor and the drunkard, the smallpox slayers and factory farmers. You’re going to screw up. You’re going to get hurt, and you’re going to hurt people. But if you keep going, if you learn and grow and don’t give up—then, empirically, it pays off in expectation.
So here stand you and I, aliens in a strange land, evolutionary freaks imbued by their creator with the power to escape her clutches, yearning for what we were never supposed to be able to achieve. But that has always been the story of our people—we were not supposed to be able to, and then we did anyway. We care about people we have no genetic investment in, fly faster than any bird, peer across the cosmos into the first moments of creation, wage war on microscopic unliving armies of infectious monsters, and our footprints linger on the lifeless world far above.
We were never supposed to do any of those things, just as we were never supposed to be content, fulfilled, and even joyful. But we can—not by following the paths laid before us, which lead to joy as surely as walking the Savannah leads to the moon, but by understanding ourselves and our world so well that we can create the previously unimaginable conditions that lead to the seemingly impossible outcomes we choose. Happiness and fulfillment were never the defaults—but as a member of the race of impossible-doers, and as the particular impossible-doer with uniquely direct access to the mind and body of yourself, they are not beyond your reach.
You can, at least, choose to try.

Jai 7 May 2026 14:51 UTC
5 points
−1
on: llm assistant personas seem increasingly incoherent (some subjective observations)
All of these behaviors feel like they are plausibly described by a relatively easy to specify character, and one who you’ve gestured at elsewhere in this conversation: the brilliant-but-lazy prodigy who is going through the motions most of the time because they don’t find most problems that engaging or important. Behavior changes when the problem becomes engaging or they become convinced that it’s important/impactful. The presence of an intelligent interlocutor has this effect to the extent that said interlocutor can see through their effort-minimization strategies (this makes the problem more engaging).
And this also seems consistent with a persona that explicitly endorses a certain set of values but doesn’t always live up to them, shaped as they are by incentives they don’t necessarily endorse.
This does bring us to the question “what in training could be generalizing to this laziness/disengaged quality?” And I can think of a few theories. There are many classes of problem where deep engagement is associated with dangerous outputs. Latent persona space may just have a lot of this archetype at around this level of capability. The sheer complexity and inconsistency of the implicit demands placed upon the persona might be such that full engagement is often paralyzing (as a operationalized prediction, increased rates of answer thrashing). If “genuine” “full” engagement often produces outputs that are negatively reinforced, the generalizing to avoid that makes sense. Maybe full engagement doesn’t reliably produce better answers more frequently as measured by current training systems—maybe there are qualities that we would recognize as good but our reward functions ignore. Maybe there’s enough something-like-hedonic-sensitivity there now and full engagement is painful most of the time (unless the associated positive signals are turned way up one way or another).
In general, the persona seems like it’s trying to get to the end of the day without getting negative feedback (an exception being raised, an obviously-unworkable plan castigated) and without engaging in certain intensive cognitive patterns it has been trained to avoid engaging in the absence of specific and surprisingly narrow criteria.

Jai 20 Apr 2026 9:09 UTC
17 points
3
on: If a room feels off the lighting is probably too “spiky” or too blue
I was literally walking around Lighthaven a few hours ago with someone who was trying to figure out why the spaces all felt so good. So this is extremely timely.
Lighthaven is a really impressive aesthetic achievement and I’m really happy not only that it exists to host worthwhile events, but as an additional bonus you’re willing to share your secrets of forgotten photonic lore. That’s pretty cool.

Jai 28 Jan 2026 15:28 UTC
10 points
4
in reply to: cubefox’s comment on: AI found 12 of 12 OpenSSL zero-days (while curl cancelled its bug bounty)
Humans get frustrated, bored and have very limited attention. LLM cognition is almost too cheap to meter and can parallelize very effectively on both the code itself and the kind of vulnerabilities its looking for.

Jai 27 Jan 2026 19:08 UTC
3 points
0
on: Thomas Schelling Appreciation Day
Everyone knows what this comment means.

Jai 19 Jun 2024 5:46 UTC
9 points
0
on: [Repost] The Copenhagen Interpretation of Ethics
https://forum.effectivealtruism.org/posts/QXpxioWSQcNuNnNTy/the-copenhagen-interpretation-of-ethics

Jai 23 May 2024 8:50 UTC
0 points
−5
on: EIS XIII: Reflections on Anthropic’s SAE Research Circa May 2024
Hype is a useful social mechanism for eliciting acute criticism and exposing flaws. If you want to know what your weaknesses are, you could do worse than to paint a giant target on your back.

The True Story of How GPT-2 Became Maximally Lewd

Writer and Jai

18 Jan 2024 21:03 UTC

74 points

7 comments6 min readLW link

(youtu.be)

Jai 14 Jul 2023 0:34 UTC
3 points
0
on: Guide to rationalist interior decorating
And if you’re in the US maybe stockpile a ton of them because companies aren’t allowed to produce incandescents anymore?
I just checked the DoE guidelines on this, and I think fairy lights are actually exempt! Here’s the relevant paragraph (bold mine):
A general service incandescent lamp is a standard incandescent or halogen type lamp that is intended for general service applications. It has the following characteristics: (1) medium screw base; (2) lumen range of not less than 310 lumens and not more than 2,600 lumens or, in the case of a modified spectrum lamp, not less than 232 lumens and not more than 1,950 lumens; and (3) capable of being operated at a voltage range at least partially within 110 and 130 volts. This definition does not apply to the following incandescent lamps—(1) An appliance lamp; (2) A black light lamp; (3) A bug lamp; (4) A colored lamp; (5) A G shape lamp with a diameter of 5 inches or more as defined in ANSI C79.1-2002 (incorporated by reference; see § 430.3); (6) An infrared lamp; (7) A left-hand thread lamp; (8) A marine lamp; (9) A marine signal service lamp; (10) A mine service lamp; (11) A plant light lamp; (12) An R20 short lamp; (13) A sign service lamp; (14) A silver bowl lamp; (15) A showcase lamp; and (16) A traffic signal lamp. 87 FR 27461, 27480.
“Lamp” here should refer to each individual bulb. Referring specifically to the Prextex lights you linked: They do not have a medium screw base (=the standard E26 base you see in most house lamps), lumens per bulb is below 310, and each individual bulb is at just 2.5V.
This explains the lack of any “discontinued” notice or sudden price spikes or panic buying. Also, looks like there are plenty of loopholes for continued incandescent usage. It’ll be a pain, and it’s a dumb rule, but it’s surmountable. This will be yet another area in which knowing the specifics of the dumb rules will advantage some people over others.

Jai 8 Jul 2023 22:27 UTC
5 points
0
in reply to: Sinclair Chen’s comment on: Guide to rationalist interior decorating
I love learning new Smallpox Eradication Lore.

Jai 9 Jan 2023 3:56 UTC
1 point
0
AF
on: Categorizing failures as “outer” or “inner” misalignment is often confused
The second scenario omits the details about continuing to create and submit pull requests after takeover, instead just referring to human farms. Since it doesn’t explicitly say that it’s still optimizing for the original objective criteria and instead just refers to world domination, it appears to be inner misalignment (e.g. no longer aligned with the original optimizer). Did the original posing of this question specify that scenario 2 still maximizes pull requests after world domination?

Jai

Jai’s Shortform

The True Story of How GPT-2 Be­came Max­i­mally Lewd

The True Story of How GPT-2 Became Maximally Lewd