AlexMennen

Karma: 4,604

AlexMennen 28 Apr 2026 4:03 UTC
11 points
6
in reply to: Vivek S’s comment on: Update on the Alex Bores campaign
if CG’s c4 arm donated $1M to Bores
This is not legal. They could donate to a super PAC that supports candidates that support AI regulation, though (e.g. Public First, or they could start their own super PAC).

AlexMennen 20 Jan 2026 5:16 UTC
3 points
1
on: What Washington Says About AGI
the graph kind of looks like a U if you squint at it
I think this visual effect could plausibly be explained by polarization, without there being any real correlation between extremeness and concern about AI x-risk. Most politicians aren’t moderate, and most politicians aren’t concerned about AI x-risk. So the distribution of ideology scores of politicans at the bottom (not concerned about AI x-risk) is bimodal, and the distribution of ideology scores of politicans near the top (very concerned about AI x-risk) is bimodal, but the whole distribution is thicker at the bottom than near the top. The density of non-x-risk-concerned moderates could be high enough to be close to saturating our ability to perceive density of dots in this graphic, so that the actually much denser regions leftward and rightward aren’t readily apparent to be much denser. But higher up, the dots aren’t dense enough to saturate our ability to perceive their density, so it is visually obvious that there are more at the extremes than in the middle.

AlexMennen 11 Jan 2026 19:43 UTC
LW: 6 AF: 2
4
AF
on: How AI Is Learning to Think in Secret
It does not seem obviously hopeless to monitor Thinkish or even Neuralese. If a model uses Thinkish in its chain of thought, then that dialect of Thinkish means something to it. Perhaps the model can be prompted to translate Thinkish appearing in another instance’s chain of thought. Or perhaps a model could be fine-tuned on understanding how a given model’s Thinkish chain of thought effects its output, and, (though I’m not sure how to train for this last step) explain how it does so in a way that humans can follow. These are things that could also be tried for apparently natural-language chains of thought that have hidden meaning that the model uses but isn’t immediately apparent to humans. And since Neuralese differs from Thinkish only in that it doesn’t re-use natural language’s token space, perhaps similar techniques could be used to translate a model’s Neuralese.

AlexMennen 2 Jan 2026 4:11 UTC
4 points
0
on: Help keep AI under human control: Palisade Research 2026 fundraiser
Does this fundraiser have a deadline?
I see that the info hovertext over the amount raised on the every.org page says that some of it was raised offline, and only lists matching funds for the remaining that wans’t raised offline. Does this mean that funds raised offline don’t get matched, that their matches from SFF was included in the “raised offline” figure, or that their matches from SFF aren’t counted in the total amount raised displayed on that page?

AlexMennen 19 Jun 2025 5:53 UTC
LW: 4 AF: 2
0
AF
on: the void
This post claims that Anthropic is embarrassingly far behind twitter AI psychologists at skills that are possibly critical to Anthropic’s mission. This suggests to me that Anthropic should be trying to recruit from the twitter AI psychologist circle.

AlexMennen 19 Jun 2025 5:50 UTC
6 points
0
in reply to: eggsyntax’s comment on: the void
In particular, your argument that putting material into the world about LLMs potentially becoming misaligned may cause problems—I agree that that’s true, but what’s the alternative? Never talking about risks from AI? That seems like it plausibly turns out worse.
I think this depends somewhat on the threat model. How scared are you of the character instantiated by the model vs the language model itself? If you’re primarily scared that the character would misbehave, and not worried about the language model misbehaving except insofar as it reifies a malign character, then maybe making the training data not give the model any reason to expect such a character to be malign would reduce the risk of this to negligible, and that sure would be easier if no one had ever thought of the idea that powerful AI could be dangerous. But if you’re also worried about the language model itself misbehaving, independently of whether it predicts that its assigned character would misbehave (for instance, the classic example of turning the world into computronium that it can use to better predict the behavior of the character), then this doesn’t seem feasible to solve without talking about it, so the decrease in risk of model misbehavior from publically discussing AI risk is probably worth the increase in risk of the character misbehaving (which is probably easier to solve anyway) that it would cause.
I don’t understand outer vs inner alignment especially well, but I think this at least roughly tracks that distinction. If a model does a great job of instantiating a character like we told it to, and that character kills us, then the goal we gave it was catastrophic, and we failed at outer alignment. If the model, in the process of being trained on how to instantiate the character, also kills us for reasons other than that it predicts the character would do so, then the process we set up for achieving the given goal also ended up optimizing for something else undesirable, and we failed at inner alignment.

AlexMennen 15 Jun 2025 18:00 UTC
2 points
0
in reply to: Karl Krueger’s comment on: Against asking if AIs are conscious
It is useful for evolved mental machinery for enabling cooperation and conflict resolution to have features like what you describe, yes. I don’t agree that this points towards there being an underlying reality.

AlexMennen 15 Jun 2025 17:48 UTC
4 points
0
in reply to: cubefox’s comment on: Against asking if AIs are conscious
You can believe that what you do or did was unethical, which doesn’t need to have anything to do with conflict resolution.
It does relate to conflict resolution. Being motivated by ethics is useful for avoiding conflict, so it’s useful for people to be able to evaluate the ethics of their own hypothetical actions. But there are lots of considerations for people to take into account when chosing actions, so this does not mean that someone will never take actions that they concluded had the drawback of being unethical. Being able to reason about the ethics of actions you’ve already taken is additionally useful insofar as it correlates with how others are likely to see it, which can inform whether it is a good idea to hide information about your actions, be ready to try to make amends, defend yourself from retribution, etc.
Beliefs are not perceptions.
If there is some objective moral truth that common moral intuitions are heavily correlated with, there must be some mechanism by which they ended up correlated. Your reply to Karl makes it sound like you deny that anyone ever perceives anything other than perception itself, which isn’t how anyone else uses the word perceive.
It doesn’t mean that we are necessarily or fully motivated to be ethical.
Yes, but if no one was at all motivated by ethics, then ethical reasoning would not be useful for people to engage in, and no one would. The fact that ethics is a powerful force in society is central to why people bother studying it. This does not imply that everyone is motivated by ethics, or that anyone is fully motivated by ethics.

AlexMennen 15 Jun 2025 16:40 UTC
2 points
0
in reply to: cubefox’s comment on: Against asking if AIs are conscious
Regardless of whether the view Eliezer espouses here really counts as moral realism, as people have been arguing about, it does seem that it would claim that there is a fact of the matter about whether a given AI is a moral patient. So I appreciate your point regarding the implications for the LW Overton window. But for what it’s worth, I don’t think Eliezer succeeds at this, in the sense that I don’t think he makes a good case for it to be useful to talk about ethical questions that we don’t have firm views on as if they were factual questions, because:
1. Not everyone is familiar with the way Eliezer proposes to ground moral language, not everyone who is familiar with it will be aware that it is what any given person means when they use moral language, and some people who are aware that a given person uses moral language the way Eliezer proposes will object to them doing so. Thus using moral language in the way Eliezer proposes, whenever it’s doing any meaningful work, invites getting sidetracked on unproductive semantic discussions. (This is a pretty general-purpose objection to normative moral theories)
2. Eliezer’s characterization of the meaning of moral language relies on some assumptions about it being possible in theory for a human to eventually acquire all the relevent facts about any given moral question and form a coherent stance on it, and the stance that they eventually arrive at being robust to variations in the process by which they arrived at it. I think these assumptions are highly questionable, and shouldn’t be allowed to escape questioning by remaining implicit.
3. It offers no meaningful action guidence beyond “just think about it more”, which is reasonable, but a moral non-realist who aspires to acquire moral intuitions on a given topic would also think of that.
One could object to this line of criticism on the grounds that we should talk about what’s true independently of how it is useful to use words. But any attempt to appeal to objective truth about moral language runs into the fact that words mean what people use them to mean, and you can’t force people to use words the way you’d like them to. It looks like Eliezer kind of tries to address this by observing that extrapolated volation shares some features in common with the way people use moral language, which is true, and seems to conclude that it is the way people use moral language even if they don’t know it, which does not follow.

AlexMennen 11 Jun 2025 15:30 UTC
3 points
0
in reply to: silentbob’s comment on: Against asking if AIs are conscious
I agree that LessWrong comments are unlikely to resolve disagreements about moral realism. Much has been written on this topic, and I doubt I have anything new to say about it, which is why I didn’t think it would be useful to try to defend moral anti-realism in the post. I brought it up anyway because the argument in that paragraph crucially relies on moral anti-realism, I suspect many readers reject moral realism without having thought through the implications of that for AI moral patienthood, and I don’t in fact have much uncertainty about moral realism.
Regarding LessWrong consensus on this topic, I looked through a couple LessWrong surveys, and didn’t find any questions about this, so, this doesn’t prove much, but just out of curiosity, I asked Claude 4 Sonnet to predict the results of such a question, and here’s what it said (which seems like a reasonable guess to me):
*Accept moral realism**: ~8%
**Lean towards moral realism**: ~12%
**Not sure**: ~15%
**Lean against moral realism**: ~25%
**Reject moral realism**: ~40%

AlexMennen 9 Jun 2025 17:56 UTC
6 points
0
in reply to: mishka’s comment on: Against asking if AIs are conscious
If our experience of qualia reflect some poorly understood phenomenon in physics, it could be part of a cluster of related phenomena, not all of which manifest in human cognition. We don’t have as precise an understanding of qualia as we do of electrons; we just try to gesture at it, and we mostly figure out what each other is talking about. If some related phenomenon manifests in computers when they run large language models, which has some things in common with what we know as qualia but also some stark differences from any such phenomen manifesting in human brains, the things we have said about what we mean when we say “qualia” might not be sufficient to determine whether said phenomenon counts as qualia or not.

AlexMennen 9 Jun 2025 17:34 UTC
8 points
2
in reply to: silentbob’s comment on: Against asking if AIs are conscious
It undercuts the motivation for believing in moral realism, leaving us with no evidence for objective moral facts, which is a complicated thing, and thus unlikely to exist without evidence.

AlexMennen 9 Jun 2025 16:29 UTC
2 points
0
in reply to: TAG’s comment on: Against asking if AIs are conscious
I tried to address this sort of response in the original post. All of these more precise consciousness-related concepts share the commonality that they were developed using our perception of our own cognition and seeing evidence that related phenomena occur in other humans. So they are all brittle in the same way when trying to extrapolate and apply them to alien minds. I don’t think that qualia is on significantly firmer epistemic ground than consciousness is.

AlexMennen 9 Jun 2025 15:40 UTC
5 points
0
in reply to: mishka’s comment on: Against asking if AIs are conscious
This is correct, but I don’t think what I was trying to express relies on Camp 1 assumptions, even though I expressed it with a Camp 1 framing. If cognition is associated with some nonphysical phenomenon, then our consciousness-related concepts are still tailored to hire this phenomenon manifests specifically in humans. There could be some related metaphysical phenomenon going on in large language model, and no objective fact as to whether “consciousness” is an appropriate word to describe it.

AlexMennen 9 Jun 2025 15:33 UTC
12 points
0
in reply to: silentbob’s comment on: Against asking if AIs are conscious
Human moral judgement seem easily explained as an evolutionary adaptation for cooperation and conflict resolution, and very poorly explained by perception of objective facts. If such facts did exist, this doesn’t give humans any reason to perceive or be motivated by them.

AlexMennen 18 May 2025 20:21 UTC
5 points
3
in reply to: Matthew Barnett’s comment on: AI Doomerism in 1879
Contemporary AI existential risk concerns originated prior to it being obvious that a dangerous AI would likely involve deep learning, so no one could claim that the arguments that existed in ~2010 involved technical details of deep learning, and you didn’t need to find anything written in the 19th century to establish this.

AlexMennen 18 May 2024 1:45 UTC
18 points
14
in reply to: habryka’s comment on: simeon_c’s Shortform
I might indeed want to create a precedent here and maybe try to fundraise for some substantial fraction of it.
I wonder if it might be more effective to fund legal action against OpenAI than to compensate individual ex-employees for refusing to sign an NDA. Trying to take vested equity away from ex-employees who refuse to sign an NDA sounds likely to not hold up in court, and if we can establish a legal precident that OpenAI cannot do this, that might make other ex-employees much more comfortable speaking out against OpenAI than the possibility that third-parties might fundraise to partially compensate them for lost equity would be (a possibility you might not even be able to make every ex-employee aware of). The fact that this would avoid financially rewarding OpenAI for bad behavior is also a plus. Of course, legal action is expensive, but so is the value of the equity that former OpenAI employees have on the line.

AlexMennen 29 Apr 2024 4:16 UTC
LW: 2 AF: 1
0
AF
in reply to: jessicata’s comment on: Dequantifying first-order theories
Yeah, sorry that was unclear; there’s no need for any form of hypercomputation to get an enumeration of the axioms of U. But you need a halting oracle to distinguish between the axioms and non-axioms. If you don’t care about distinguishing axioms from non-axioms, but you do want to get an assignment of truthvalues to the atomic formulas Q(i,j) that’s consistent with the axioms of U, then that is applying a consistent guessing oracle to U.

AlexMennen 25 Apr 2024 4:29 UTC
LW: 6 AF: 1
0
AF
in reply to: jessicata’s comment on: Dequantifying first-order theories
I see that when I commented yesterday, I was confused about how you had defined U. You’re right that you don’t need a consistent guessing oracle to get from U to a completion of U, since the axioms are all atomic propositions, and you can just set the remaining atomic propositions however you want. However, this introduces the problem that getting the axioms of U requires a halting oracle, not just a consistent guessing oracle, since to tell whether something is an axiom, you need to know whether there actually is a proof of a given thing in T.

AlexMennen 24 Apr 2024 5:57 UTC
LW: 7 AF: 1
0
AF
on: Dequantifying first-order theories
I think what you proved essentially boils down to the fact that a consistent guessing oracle can be used to compute a completion of any consistent recursively axiomatizable theory. (In fact, it turns out that a consistent guessing oracle can be used to compute a model (in the sense of functions and relations on a set) of any consistent recursively axiomatizable theory; this follows from what you showed and the fact that an oracle for a complete theory can be used to compute a model of that theory.)
I disagree with
Philosophically, what I take from this is that, even if statements in a first-order theory such as Peano arithmetic appear to refer to high levels of the Arithmetic hierarchy, as far as proof theory is concerned, they may as well be referring to a fixed low level of hypercomputation, namely a consistent guessing oracle.
The translation from T to U is computable. The consistent guessing oracle only came in to find a completion of U, but it could also find a completion of T (in fact, a completion of U can be computably translated to a completion of T), so the consistent guessing oracle doesn’t really have anything to do with the relationship between T and U.