Agent foundations, AI macrostrategy, civilizational sanity, human enhancement.
I endorse and operate by Crocker’s rules.
I have not signed any agreements whose existence I cannot mention.
Agent foundations, AI macrostrategy, civilizational sanity, human enhancement.
I endorse and operate by Crocker’s rules.
I have not signed any agreements whose existence I cannot mention.
As far as I understand, at least one of the authors has an unusual moral philosophy such as not believing in consciousness or first-person experiences, while simultaneously believing that future AIs are automatically morally worthy simply by having goals.
[narrow point, as I agree with most of the comment]
For what it’s worth, I think this seems to imply that illusionism (roughly, people who, in a meaningful sense, “don’t believe in consciousness”) makes people more inclined to act in ethically deranged ways, but, afaict, this mostly isn’t the case, because I’ve known a few illusionists (was one myself until ~1 year ago) and, afaict, they were all decent people, not less decent than the average of my social surroundings.
To give an example, Dan Dennett was an illusionist and very much not a successionist. Similarly, I wouldn’t expect any successionist aspirations from Keith Frankish.
There are caveats, though in that I do think that a sufficient combination of ideas which are individually fine, even plausibly true (illusionism, moral antirealism, …), and some other stuff (character traits, paycheck, social milieu) can get people into pretty weird moral positions.
So there’s steelmanning, where you construct a view that isn’t your interlocutor’s but is, according to you, more true / coherent / believable than your interlocutor’s.
[nitpick] while also being close to your interlocutor’s (perhaps so that your interlocutor’s view could be the steelmanned view with added noise / passed through Chinese whispers / degenerated).
A proposed term
Exoclarification? Alloclarification? Democlarification (dēmos—“people”)?
Another perhaps example, though not quite analytic philosophy, but rather a neo-religion: Discordianism.
Specifically, see here: https://en.wikipedia.org/wiki/Principia_Discordia#Overview
Computers are getting smarter and making entities smarter than yourself, which you don’t understand is very unsafe.
Scott criticizes the Example ASI Scenario as the weakest part of the book; I think he’s right, it might be a reasonable scenario but it reads like sci-fi in a way that could easily turn off non-nerds. That said, I’m not sure how it could have done better.
I think the scenario in VIRTUA requires remarkably little suspension of disbelief; it’s still “sci-fi-ish”, but less “sci-fi-ish” than the one in IABIED (according to my model of the general population), and leads to ~doom anyway.
(I feel like I’m groping for a concept analogous to an orthogonal basis in linear algebra—a concept like “the minimal set of words that span an idea”—and the title “If Anyone Builds It, Everyone Dies” almost gets there)
You don’t need orthogonality to get a minimal set that spans some idea/subspace/whatever.
Also, it would be good to deconflate the things that these days go as “AI agents” and “Agentic™ AI”, because it makes people think that the former are (close to being) examples of the latter. Perhaps we could rename the former to “AI actors” or something.
(Sidenote: Both “agent” and “actor” derive from Latin agere, meaning “to drive, lead, conduct, manage, perform, do”. Coincidentally, the word “robot” was coined from the Czech “robota”, meaning “work”, and also related to “robit”, meaning “to do” (similar words mean “to do” in many other Slavic languages).)
“SLT” as “Singular Learning Theory” →”SiLT”
“SLT” as “Statistical Learning Theory” →”StaLT”
“SLT” as “Sharp Left Turn” →”ShaLT”
It is “instrumental” but in a different sense, of being convergence in instrumentality, similarly to “moral convergence” (although if moral convergence qua convergence of morality is true, then presumably it is also moral to converge on the convergent morality (according to default interpretations of the idea, at least)).
There’s the equivalence of categories. Two categories are equivalent when they are isomorphic up to an isomorphism. Specifically, if you have two functors and , such that there are natural isomorphisms (invertible natural transformations) and . On objects, this means that if you start at an object in , then you can go to in and then to in that is isomorphic to : . Similarly if you start at some object in .
Equivalence is an isomorphism when the isomorphisms (equivalently,[1] the natural transformations between the compositions and identity functors) are equalities.
An even weaker equality-like notion is adjunction and this where things start to get asymmetrical. There’s a few equivalent[2] ways of defining them[3], but the contextually simplest way (if not completely rigorous), since I just described equivalences, is that it’s an equivalence, except the natural transformations and are not (in general) isomorphic. So, you go and you could instead have gotten there via some morphism in the category but you may not be able to go back . On the other hand, starting at some in , you can take the trip and go back to where you started via some morphism within , . Again, there may not be a morphism .
Then we say that F is left adjoint to G (equivalently, G is right adjoint to F), denoted . The natural transformations and are called the unit and the counit of the adjunction, respectively.
Seven Sketches in Compositionality introduces adjunctions in a slow and palatable way that is good for building an intuition, starting with Galois connections, which are just adjunctions for preorders, which are just Bool-categories.
Let this thread be the canonical reference for the posterity that this idea appeared in my mind at ODYSSEY at the session “Announcing Universal Algorithmic Intelligence Reading Group” held by you and Aram in Bayes Attic, on Wednesday, 2025-08-27, around 07:20 PM, when, for whatever reason, the two S Learning Theories entered the conversation, when you were putting some words on the whiteboard, and somebody voiced a minor complaint that SLT stands for both of them.
I usually don’t, though maybe unconsciously? Plausibly it would be good for me to try to track it explicitly.
cf https://www.lesswrong.com/posts/bhLxWTkRc8GXunFcB/what-are-you-tracking-in-your-head
The acronym SLT (in this community) is typically taken/used to refer to Singular Learning Theory, but sometimes also to (~old-school-ish) Statistical Learning Theory and/or to Sharp Left Turn.
I therefore put that to disambiguate between them and to clean up the namespace, we should use SiLT, StaLT, and ShaLT, respectively.
It is easy to see that factorizations and partitions are duals if we model them in the category FinSet.
A partition on a set is “just” an epimorphism (i.e., a surjection in FinSet), . That’s because an epimorphism induces a partition on indexed by the elements . (Surjectivity/epicness is necessary because without it we would have some ’s beyond the image of , so that their preimages would be empty: ). In the other direction, any partition induces a unique surjection mapping each element of to the part it belongs to. It’s easy to see that these two views are equivalent (i.e. moving partition→epimorphism→partition gets us back to the same partition and similarly epimorphism→partition→epimorphism).
So, a factorization of a set is an isomorphism constructed as a product of surjections indexed by a (finite) set , , . Explicitly: . Moreover, we require each partition/epic to be non-trivial, i.e., it must have at least two elements, so none of the ’s is a singleton, i.e., the terminal object.
If we dualize this construction, we get an isomorphism from the coproduct (i.e., disjoint sum in FinSet) to the set , that is constructed from monomorphisms (i.e., injections in FinSet) , . Moreover, since in the factorization case we assumed that none of the ‘s is terminal (singleton), here, after dualization, none of the ’s is initial, i.e., it is not the empty set. The isomorphism means that the two sets are equinumerous: , so the set of the “co-basis” elements is isomorphic to a partition of , since, as we just remarked, each is non-empty. In other words, the “co-basis” elements “are” parts of a partition. Each being an injection means that , but that’s already implied by equinumerosity (actually, strict inequality is implied because each is non-empty). The natural interpretation of the monic is the subset inclusion of the elements of the part .
To go from the partition-as-epi view , we “convert” it into the isomorphism between and the disjoint union of the parts of the partition (where ), which can be viewed as a coproduct of subset inclusions (i.e. monics/injections), and then dualize to get .
[Previously in categorical view of FFS: drocta and Gurkenglas. Most likely somebody has figured this out already, but I haven’t seen it written up anywhere, so I’m posting this comment.]
On IABIED
First things first, I wholeheartedly endorse the main actionable conclusion: Ban unrestrained progress on AI that can kill us all.
I broadly think Eliezer and Nate did a good job communicating what’s so difficult about the task of building a thing that is more intelligent than all of humanity combined and shaped appropriately so as to help us, rather than have a volition of its own that runs contrary to ours.[1]
The main (/most salient) disagreement I can see at the moment is the authors’ expectations of value-strangeness and maximizeriness of superintelligence; or rather, I am much more uncertain about this. However, this detail is not relevant for the desirability of the post-ASI future, conditional on business-close-to-as-usual and therefore not relevant for whether the ban is good.
(Also, not sure about their choice of some stories/parables, but that’s a minor issue as well.)
I liked the comparison with the Allies winning against the Axis in WWII, which, at least in resource/monetary terms, must have costed much more than it would cost to implement the ban. The things we’re missing at the moment are awareness of the issue, pulling ourselves together, and collective steam.
Whatever that means, cf the problems of CEV and idealized values.
I guess. But I would think the bigger issue is that people don’t notice.
I think “Elaborate” would be more useful (i.e., more likely to actually induce an elaboration on the point being reacted at) if its corresponding notification was grouped not with karma updates and other reacts (which people get notified about in a batch, daily, weekly, etc.), but rather with the normal notifs like “so-and-so replied to your comment/post” or “you received a message from so-and-so”. (Probably the same for the new “Let’s make a bet!” react.)
But this would most likely require doing something somewhat inelegant and complicated to the codebase, so it may not be worth it, atm at least.
It seems not-very-unlikely to me that, over the next few years, many major (and some non-major) world religions will develop a “Butlerian” attitude to machine intelligence: deeming it a profanity to attempt to replicate (or even to do things that have a non-negligible chance to result in replicating) all the so-far-unique capacities/properties of the human mind, and will use it to justify their support of a ban, along with the catastrophic/existential risks on which they (or some fraction of them) would agree with worried seculars.
In a sense, both human-bio-engineering and AI are (admissible to be seen by conservatively religious folks as) about “manipulating the God-given essence of humanity”, which amounts to admitting that God’s creation is flawed/imperfect/in need of further improvement.
The simplest general way is to buy it in whatever format and then download it from one of the well-known websites with free pdfs/mobis/epubs.
Analytic metaphysics, as far as I can tell, mostly tacitly rejects ontological cluelessness.
To give some ~examples from the analytic tradition: As far as I understand them, Brian Cantwell Smith and Nancy Cartwright espouse(d[1]) a view somewhat adjacent to ontological cluelessness, albeit perhaps slightly stronger, in that, according to (my model of?) them, there is no final/fundamental basis of reality and it’s not infinite regress either.
Somewhat more specifically, reading BCS’s On the Origin of Objects (haven’t read Cartwright yet) gave me the picture of a gunky-unknowable reality, where for a “part” of reality to even become a type of thing that can be known, it needs to be stabilized into a knowable object or something like that, and that process of stabilization involves parts/regions of the universe acting at a distance in a way that involves a primitive form of “aboutness” (?).
(There is some superficial semi-inconsistency in this way of talking about it, in that it describes [what it takes a not-yet-a-Thing to stabilize into a (knowable) “Thing”] in terms of knowable Things, so the growing Thing should also be knowable by transitivity or something (?). But I don’t think I’m passing BCS’s ITT.)
For another adjacent analyticist, Eric Schwitzgebel? https://faculty.ucr.edu/~eschwitz/SchwitzAbs/Weirdness.htm
Oh, and how could I forget The Guy Against Reality? https://en.wikipedia.org/wiki/Donald_D._Hoffman
I just saw that BCS died 18 days ago :(.
Good post!
Why did you call it “exhaustive free association”? I would lean towards something more like “arguing from (falsely complete) exhaustion”.
Re it being almost good reasoning, a main thing making it good reasoning rather than bad reasoning is having a good model of the domain so that you actually have good reasons to think that your hypothesis space is exhaustive.