WillPetillo

Karma: 488

WillPetillo 6 Nov 2025 21:17 UTC
LW: 13 AF: 7
2
AF
on: Legible vs. Illegible AI Safety Problems
What is the legibility status of the problem of requiring problems to be legible before allowing them to inform decisions? The thing I am most concerned about wrt AI is our societal-level filters for what counts as a “real problem.”
What links here?
- Problems I’ve Tried to Legibilize by Wei Dai (9 Nov 2025 10:27 UTC; 129 points)
- Problems I’ve Tried to Legibilize by Wei Dai (EA Forum; 9 Nov 2025 10:27 UTC; 46 points)

The Techno-Pessimist Lens

WillPetillo12 Sep 2025 0:47 UTC

17 points

6 comments6 min readLW link

Principles of AI Uncontrollability

WillPetillo7 Aug 2025 21:10 UTC

1 point

0 comments7 min readLW link

WillPetillo 31 Jul 2025 1:01 UTC
4 points
0
on: My Empathy Is Rarely Kind
My takeaway from this post is that there are several properties of relating that people expect to converge, but in your case (and in some contexts) don’t. With empathy, there’s:

1. Depth of understanding of the other person’s experience
2. Negative judgment
3. Mirroring

I mention 3 because I think it’s strictly closer to the definition of empathy than 1, but it’s mostly irrelevant to this post. If I had this kind of empathy for the woman in the video, I’d be thinking: “man, my head hurts.”

The common narrative is that as 1 increases, 2 drops to zero, or even becomes positive judgement. This is probably true sometime, such as when counteracting the fundamental attribution error, but sometimes not: “This person is isn’t getting their work done, that’s somewhat annoying...oh, it’s because they don’t care about their education? Gaaahhh!!!” I can relate to this.

Regarding relating better without lowering standards, the questions that come to my mind are:

1. Is this a case where things have to get worse before they get better? As in, zero understanding leads to low judgement with suspension of disbelief, motivational understanding leads to high judgement, but full-story understanding returns to low judgment without relying on suspension of disbelief. Is there a way to test this without driving yourself crazy or taking up an inordinate amount of time?
2. Can you dissolve your moral judgement while keeping understanding constant? That is: “this teammate isn’t doing their share of the work because they didn’t care enough to be prepared...and this isn’t a thing I need to be angry about.” If this route looks interesting, my suggestion for the first step of the path is to introspect on the anger/disgust/etc. and what it’s protecting.

WillPetillo 28 Jul 2025 21:35 UTC
3 points
2
in reply to: Vladimir_Nesov’s comment on: It’s dangerous to calculate p(doom) alone! Take this.
This is a useful application of a probability map! If an important term has multiple competing definitions, create nodes for all of them, link the ones you consider important to a central p(doom) node (assuming you are interested in that concept), and let other people disagree with your assessment, but with a clearer sense of what they specifically disagree about.

It’s dangerous to calculate p(doom) alone! Take this.

WillPetillo27 Jul 2025 22:34 UTC

16 points

4 comments5 min readLW link

Lenses, Metaphors, and Meaning

WillPetillo, Sean Herrington, Spencer Ames, Adebayo Mubarak and Cancus

8 Jul 2025 19:46 UTC

7 points

0 comments4 min readLW link

WillPetillo 1 Jul 2025 0:55 UTC
25 points
2
on: The best simple argument for Pausing AI?
The basic contention here seems to be that the biggest dangers of LLMs is not from the systems themselves, but from the overreliance, excessive trust, etc. that societies and institutions put on them. Another is that “hyping LLMs”—which I assume includes folks here expressing concerns that AI will go rogue and take over the world—increases perceptions of AI’s abilities, which feeds into this overreliance. A conclusion is that promoting “x-risk” as a reason for pausing AI will have the unintended side effect of increasing (catastrophic, but not existential) dangers associated with overreliance.

This is an interesting idea, not least because it’s a common intuition among the “AI Ethics” faction, and therefore worth hashing out. Here are my reasons for skepticism:

1. The hype that matters comes from large-scale investors (and military officers) trying to get in on the next big thing. I assume these folks are paying more attention to corporate sales pitches than Internet Academics and people holding protest signs—and that their background point of reference is not Terminator, but the FOMO common in the tech industry (which makes sense in a context where losing market share is a bigger threat than losing investment dollars).
2. X-risk scenarios are admittedly less intuitive in the context of self supervised learning based LLMs than they were back when reinforcement learning was at the center of development as AI learned to play increasingly broad ranges of games. These systems regularly specification-gamed their environments and it was chilling to think about what would happen when a system could treat the entire world as a game. A concern now, however, is that agency will make a comeback because it is economically useful. Imagine the brutal, creative effectiveness of RL combined with the broad-based common sense of SSL. This reintegration of agency (can’t speak to the specific architecture) into leading AI systems is what the tech companies are actively developing towards. More on this concept in my Simulators sequence.

I, for one, will find your argument more compelling if you (1) take a deep dive into AI development motivations, rather than just lumping it all together as “hype”, and (2) explain why AI development stops with the current paradigm of LLM-fueled chatbots or something similarly innocuous in itself but potentially dangerous in the context of societal overreliance.

WillPetillo 28 Jun 2025 22:47 UTC
3 points
2
in reply to: ChristianKl’s comment on: Project Moonbeam
The motivation of this post was to design a thought experiment involving a fully self-sufficient machine ecology that remains within constraints designed to benefit something outside of the system, not as a suggestion for how to make best use of the moon.

Project Moonbeam

WillPetillo27 Jun 2025 21:08 UTC

14 points

2 comments6 min readLW link

WillPetillo 25 Jun 2025 7:15 UTC
1 point
0
in reply to: RogerDearnaley’s comment on: Aligning Agents, Tools, and Simulators
Agree, when discussing the alignment of simulators in this post, we are referring to safety from the subset of dangers related to unbounded optimization towards alien goals, which does not include everything within value alignment, let alone AI safety. But this qualification points to a subtle meaning drift in use of the word “alignment” in this post (towards something like “comprehension and internalization of human values”) which isn’t good practice and something I’ll want to figure out how to edit/fix soon.

Emergence of Simulators and Agents

WillPetillo, Sean Herrington, Spencer Ames, Adebayo Mubarak and Cancus

25 Jun 2025 6:59 UTC

10 points

0 comments5 min readLW link

Agents, Simulators and Interpretability

Sean Herrington, WillPetillo, Spencer Ames, Cancus and Adebayo Mubarak

7 Jun 2025 6:06 UTC

11 points

0 comments5 min readLW link

WillPetillo 26 May 2025 7:22 UTC
0 points
0
in reply to: RogerDearnaley’s comment on: Case Studies in Simulators and Agents
I am having difficulty seeing why anyone would regard these two viewpoints as opposed.

We discuss this indirectly in the first post in this sequence outlining what it means to describe a system through the lens of an agent, tool, or simulator. Yes, the concepts overlap, but there is nonetheless a kind of tension between them. In the case of agent vs. simulator, our central question is: which property is “driving the bus” with respect to the system’s behavior, utilizing the other in its service?

The second post explores the implications of the above distinction, predicting different types of values—and thus behavior—from an agent that contains a simulation of the world and uses it to navigate vs. a simulator that generates agents because such agents are part of the environment the system is modelling vs. a system where the modes are so entangled it is meaningless to even talk about where one ends and the other begins. Specifically, I would expect simulator-first systems to have wide value boundaries that internalize (and approximation of) human values, but more narrow, maximizing behavior from agent-first systems.

Case Studies in Simulators and Agents

WillPetillo, Sean Herrington, Spencer Ames, Adebayo Mubarak and Cancus

25 May 2025 5:40 UTC

13 points

8 comments6 min readLW link

Aligning Agents, Tools, and Simulators

WillPetillo, Sean Herrington, Spencer Ames, Adebayo Mubarak and Cancus

11 May 2025 7:59 UTC

22 points

2 comments6 min readLW link

WillPetillo 11 May 2025 7:17 UTC
3 points
0
on: Interest In Conflict Is Instrumentally Convergent
It seems to me that the most robust solution is to do it the hard way: know the people involved really well, both directly and via reputation among people you also know really well—ideally by having lived with them in a small community for a few decades.

Agents, Tools, and Simulators

WillPetillo, Sean Herrington, Adebayo Mubarak, Cancus and Spencer Ames

2 May 2025 20:19 UTC

14 points

5 comments10 min readLW link

WillPetillo 12 Apr 2025 22:31 UTC
7 points
−1
on: Why does LW not put much more focus on AI governance and outreach?
Selection bias. Those of us who were inclined to consider working on outreach and governance have joined groups like PauseAI, StopAI, and other orgs. A few of us reach back on occasion to say “Come on in, the water’s fine!” The real head-scratcher for me is the lack of engagement on this topic. If one wants to deliberate on a much higher level of detail than the average person, cool—it takes all kinds to make a world. But come on, this is obviously high stakes enough to merit attention.

Anti-memes: x-risk edition

WillPetillo10 Apr 2025 23:35 UTC

15 points

0 comments7 min readLW link