kaarelh AT gmail DOT com
Kaarel
While I’m probably much more of a lib than you guys (at least in ordinary human contexts), I also think that people in AI alignment circles mostly have really silly conceptions of human valuing and the historical development of values.[1] I touch on this a bit here. Also, if you haven’t encountered it already, you might be interested in Hegel’s work on this stuff — in particular, The Phenomenology of Spirit.
- ↩︎
This isn’t to say that people in other circles have better conceptions…
- ↩︎
It’s how science works: You focus on simple hypotheses and discard/reweight them according to Bayesian reasoning.
There are some ways in which solomonoff induction and science are analogous[1], but there are also many important ways in which they are disanalogous. Here are some ways in which they are disanalogous:
A scientific theory is much less like a program that prints (or predicts) an observation sequence than it is like a theory in the sense used in logic. Like, a scientific theory provides a system of talking which involves some sorts of things (eg massive objects) about which some questions can be asked (eg each object has a position and a mass, and between any pair of objects there is a gravitational force) with some relations between the answers to these questions (eg we have an axiom specifying how the gravitational force depends on the positions and masses, and an axiom specifying how the second derivative of the position relates to the force).[2]
Science is less in the business of predicting arbitrary observation sequences, and much more in the business of letting one [figure out]/understand/exploit very particular things — like, the physics someone knows is going to be of limited help when they try to predict the time sequence of intensities of pixel on their laptop screen, but it is going to help them a lot when solving the kinds of problems that would show up in a physics textbook.
Even for solving problems that a theory is supposed to help one solve (and for the predictions it is supposed to help one make), a scientific theory is highly incomplete — in addition to the letter of the theory, a human solving the problems in a classical mechanics textbook will be majorly relying on tacit understanding gained from learning classical mechanics and their common-sense understanding.
Making scientific progress looks less like picking out a correct hypothesis from some set of pre-well-specified hypotheses by updating on data, and much more like coming up with a decent way to think about something where there previously wasn’t one. E.g. it could look like Faraday staring at metallic filings near a magnet and starting to talk about the lines he was seeing, or Lorentz, Poincaré, and Einstein making sense of the result of the Michelson-Morley experiment. Imo the bayesian conception basically completely fails to model gaining scientific understanding.
Scientific theories are often created to do something — I mean: to do something other than predicting some existing data — e.g., to make something; e.g., see https://en.wikipedia.org/wiki/History_of_thermodynamics.
Scientific progress also importantly involves inventing new things/phenomena to study. E.g., it would have been difficult to find things that Kirchhoff’s laws could help us with before we invented electric circuits; ditto for lens optics and lenses).
Idk, there is just very much to be said about the structure of science and scientific progress that doesn’t show up in the solomonoff picture (or maaaybe at best in some cases shows up inexplicitly inside the inductor). I’ll mention a few more things off the top of my head:
having multiple ways to think about something
creating new experimental devices/setups
methodological progress (e.g. inventing instrumental variable methods in econometrics)
mathematical progress (e.g. coming up with the notion of a derivative)
having a sense of which things are useful/interesting to understand
generally, a human scientific community doing science has a bunch of interesting structure; in particular, the human minds participating in it have a bunch of interesting structure; one in fact needs a bunch of interesting structure to do science well; in fact, more structure of various kinds is gained when making scientific progress; basically none of this is anywhere to be seen in solomonoff induction
- ↩︎
for example, that usually, a scientific theory could be used for making at least some fairly concrete predictions
- ↩︎
To be clear: I don’t intend this as a full description of the character of a scientific theory — e.g., I haven’t discussed how it gets related to something practical/concrete like action (or maybe (specifically) prediction). A scientific theory and a theory-in-the-sense-used-in-logic are ultimately also disanalogous in various ways — I’m only claiming it’s a better analogy than that between a scientific theory and a predictive model.
However, the reference class that includes the theory of computation is one possible reference class that might include the theory of agents.[1] But for all (I think) we know, the reference class we are in might also be (or look more like) complex systems studies, where you can prove a bunch of neat things, but there’s also a lot of behavior that is not computationally reducible and instead you need to observe, simulate, crunch the numbers. Moreover, noticing surprising real-world phenomena can serve as a guide to your attempts to explain the observed phenomena in ~mathematical terms (e.g., how West et al. explained (or re-derived) Kleiber’s law from the properties of intra-organismal resource supply networks[2]). I don’t know what the theory will look like; to me, its shape remains an open a posteriori question.
along an axis somewhat different than the main focus here, i think the right picture is: there is a rich field of thinking-studies. it’s like philosophy, math, or engineering. it includes eg Chomsky’s work on syntax, Turing’s work on computation, Gödel’s work on logic, Wittgenstein’s work on language, Darwin’s work on evolution, Hegel’s work on development, Pascal’s work on probability, and very many more past things and very many more still mostly hard-to-imagine future things. given this, i think asking about the character of a “theory of agents” would already soft-assume a wrong answer. i discuss this here
i guess a vibe i’m trying to communicate is: we already have thinking-studies in front of us, and so we can look at it and get a sense of what it’s like. of course, thinking-studies will develop in the future, but its development isn’t going to look like some sort of mysterious new final theory/science being created (though there will be methodological development (like for example the development of set-theoretic foundations in mathematics, or like the adoption of statistics in medical science), and many new crazy branches will be developed (of various characters), and we will surely resolve various particular questions in various ways (though various other questions call for infinite investigations))
Hmm, thanks for telling me, I hadn’t considered that. I think I didn’t notice this in part because I’ve been thinking of the red-black circle as being “canceled out”/”negated” on the flag, as opposed to being “asserted”. But this certainly wouldn’t be obvious to someone just seeing the flag.
I designed a pro-human(ity)/anti-(non-human-)AI flag:
The red-black circle is HAL’s eye; it represents the non-human in-all-ways-super-human AI(s) that the world’s various AI capability developers are trying to create, that will imo by default render all remotely human beings completely insignificant and cause humanity to completely lose control over what happens :(.
The white star covering HAL’s eye has rays at the angles of the limbs of Leonardo’s Vitruvian Man; it represents humans/humanity remaining more capable than non-human AI (by banning AGI development and by carefully self-improving).
The blue background represents our potential self-made ever-better future, involving global governance/cooperation/unity in the face of AI.
Feel free to suggest improvements to the flag. Here’s latex to generate it:
% written mostly by o3 and o4-mini-high, given k’s prompting
% an anti-AI flag. a HAL “eye” (?) is covered by a vitruvian man star
\documentclass[tikz]{standalone}
\usetikzlibrary{calc}
\usepackage{xcolor} % for \definecolor
\definecolor{UNBlue}{HTML}{5B92E5}\begin{document}
\begin{tikzpicture}
%--------------------------------------------------------
% flag geometry
%--------------------------------------------------------
\def\flagW{6cm} % width → 2 : 3 aspect
\def\flagH{4cm} % height
\def\eyeR {1.3cm} % HAL-eye radius
% light-blue background
\fill[UNBlue] (0,0) rectangle (\flagW,\flagH);%--------------------------------------------------------
% concentric “HAL eye” (outer-most ring first)
%--------------------------------------------------------
\begin{scope}[shift={(\flagW/2,\flagH/2)}] % centre of the flag
\foreach \f/\c in {%
1.00/black,
.68/{red!50!black},
.43/{red!80!orange},
.1/orange,
.05/yellow}%
{%
\fill[fill=\c,draw=none] circle ({\f*\eyeR});
}%── parameters ───────────────────────────────────────
\def\R{\eyeR} % distance from centre to triangle’s tip
\def\Alpha{10} % full apex angle (°)
%── compute half-angle & half-base once ─────────────
\pgfmathsetmacro\halfA{\Alpha/2}
\pgfmathsetlengthmacro\halfside{\R*tan(\halfA)}%── loop over Vitruvian‐man angles ───────────────────
\foreach \Beta in {0,30,90,150,180,240,265,275,300} {%
% apex on the eye‐rim
\coordinate (A) at (\Beta:\R);
% base corners offset ±90°
\coordinate (B) at (\Beta+90:\halfside);
\coordinate (C) at (\Beta-90:\halfside);
% fill the spike
\path[fill=white,draw=none] (A) -- (B) -- (C) -- cycle;
}\end{scope}
\end{tikzpicture}
\end{document}
Conversely, there is some (potentially high) threshold of societal epistemics + coordination + institutional steering beyond which we can largely eliminate anthropogenic x-risk, potentially in perpetuity
Note that this is not a logical converse of your first statement. I realize that the word “conversely” can be used non-strictly and might in fact be used this way by you here, but I’m stating this just in case.
My guess is that “there is some (potentially high) threshold of societal epistemics + coordination + institutional steering beyond which we can largely eliminate anthropogenic x-risk in perpetuity” is false — my guess is that improving [societal epistemics + coordination + institutional steering] is an infinite endeavor; I discuss this a bit here. That said, I think it is plausible that there is a possible position from which we could reasonably be fairly confident that things will be going pretty well for a really long time — I just think that this would involve one continuing to develop one’s methods of [societal epistemics, coordination, institutional steering, etc.] as one proceeds.
Basically nobody actually wants the world to end, so if we do that to ourselves, it will be because somewhere along the way we weren’t good enough at navigating collective action problems, institutional steering, and general epistemics
… or because we didn’t understand important stuff well enough in time (for example: if it is the case that by default, the first AI that could prove would eat the Sun, we would want to firmly understand this ahead of time), or because we weren’t good enough at thinking (for example, people could just be lacking in iq, or have never developed an adequate sense of what it is even like to understand something, or be intellectually careless), or because we weren’t fast enough at disseminating or [listening to] the best individual understanding in critical cases, or because we didn’t value the right kinds of philosophical and scientific work enough, or because we largely-ethically-confusedly thought some action would not end the world despite grasping some key factual broad strokes of what would happen after, or because we didn’t realize we should be more careful, or maybe because generally understanding what will happen when you set some process in motion is just extremely cursed.[1] I guess one could consider each of these to be under failures in general epistemics… but I feel like just saying “general epistemics” is not giving understanding its proper due here.
- ↩︎
Many of these are related and overlapping.
- ↩︎
the long run equilibrium of the earth-originating civilization
(this isn’t centrally engaging with your shortform but:) it could be interesting to think about whether there will be some sort of equilibrium or development will meaningfully continue (until the heat death of the universe or until whatever other bound of that kind holds up or maybe just forever)[1]
Summarizing documents, and exploring topics I’m no expert in: Super good
I think you probably did this, but I figured it’s worth checking: did you check this on documents you understand well (such as your own writing) and topics you are an expert on?
I think this approach doesn’t make sense. Issues, briefly:
if you want to be squaring , you need it to be square — you should append another row of s
this matrix does not have a logarithm, because it isn’t invertible ( https://en.wikipedia.org/wiki/Logarithm_of_a_matrix#Existence )[1]
there in fact isn’t any matrix that could reasonably be considered a , because such an should satisfy , but the matrix does not have a square root (see e.g. https://math.stackexchange.com/a/66156/540174 for how to think about this)
- ↩︎
also, note that it generally doesn’t make sense to speak of the of a matrix — a matrix can have (infinitely) many logarithms ( https://en.wikipedia.org/wiki/Logarithm_of_a_matrix#Example:_Logarithm_of_rotations_in_the_plane )
I’d rather your “that is” were a “for example”. This is because:
It’s also possible for the process of updates to not be getting arbitrarily close to any endpoint (with a notion of closeness that is imo appropriate for this context), without there being any sentence on which one keeps changing one’s mind. If we’re thinking of one’s “ethical state of mind” as being given by the probabilities one assigns to some given countable collection of sentences, then here I’m saying that it can be reasonable to use a notion of convergence which is stronger than pointwise convergence. For math, if one just runs a naive proof search and assigns truth value 1 to proven sentences and 0 to disproven sentences, one could try to say this sequence of truth value assignments is converging to the assignment that gives 1 to all provable sentences and 0 to all disprovable sentences (and whatever the initialization assigns to all independent sentences, let’s say), but I think that in our context of imagining some long reflection getting close to something in finite time, it’s more reasonable to say that one isn’t converging to anything in this example — it seems pretty intuitive to say that after any finite number of steps, one hasn’t really made much progress toward this kinda-endpoint (after all, one will have proved only finitely many things, and one still has infinitely many more things left to prove). Bringing this a tad closer to ethical reality: we could perhaps imagine someone repeatedly realizing that projects they hadn’t really considered before are worth working on, infinitely many times, with what they are up to thus changing [by a lot] [infinitely many times]. To spell out the connection of this to the math example a bit more: the common point is that novelty can appear in the sentences/things considered, so one can have novelty even if novelty doesn’t keep showing up in how one relates to any given sentence/thing. I say more about these themes here.
I feel like items on your current list have of the responsibility for what I’d consider software updates in humans, and that it sorta fails to address almost all the ordinary stuff that goes on when individual humans are learning stuff (from others or independently) or when “humanity is improving its thinking”. But that makes me think that maybe I’m missing what you’re going for with this list?[1] Continuing with the (possibly different) question I have in mind anyway, here’s a list that imo points toward a decent chunk of what is missing from your list, with a focus on the case of independent and somewhat thoughtful learning/[thinking-improving] (as opposed to the case of copying from others, and as opposed to the case of fairly non-thoughtful thinking-improving)[2]:
a mathematician coming up with a good mathematical concept and developing a sense of how to use it (and ditto for a mathematical system/formalism)[3]
seeing a need to talk about something and coining a word for it
a philosopher trying to clarify/re-engineer a concept, eg by seeing which more precise definition could accord with the concept having some desired “inferential role”
noticing and resolving tensions in one’s views
discovering/inventing/developing the scientific method; inventing/developing p-values; improving peer review
discussing what kinds of evidence could help with some particular scientific question
inventing writing; inventing textbooks
the varied thought that is upstream of a professional poker player thinking the way they do when playing poker
asking oneself “was that a reasonable inference?”, “what auxiliary construction would help with this mathematical problem?”, “which techniques could work here?”, “what is the main idea of this proof?”, “is this a good way to model the situation?”, “can I explain that clearly?”, “what caused me to be confused about that?”, “why did I spend so long pursuing this bad idea?”, “how could I have figured that out faster?”, “which question are we asking, more precisely?”, “why are we interested in this question?”, “what is this analogous to?”, “what should I read to understand this better?”, “who would have good thoughts on this?”
- ↩︎
I will note that when I say , this is wrt a measure that cares a lot about understanding how it is that one improves at doing difficult thinking (like, math/philosophy/science/tech research), and I could maybe see your list covering if one cared relatively more about software updates affecting one’s emotional/social life or whatever, but I’d need to think more about that.
- ↩︎
it has such a focus in large part because such a list was easy for me to provide — the list is copied from here with light edits
- ↩︎
two sorta-examples: humanity starting to think probabilistically, humanity starting to think in terms of infinitesimals
- 20 Aug 2025 5:30 UTC; 13 points) 's comment on Leon Lang’s Shortform by (
I agree it’s a pretty unfortunate/silly question. Searle’s analysis of it in Chapter 1 of Seeing Things as They Are is imo not too dissimilar to your analysis of it here, except he wouldn’t think that one can reasonably say “the world we see around us is an internal perceptual copy” (and I myself have trouble compiling this into anything true also), though he’d surely agree that various internal things are involved in seeing the world. I think a significant fraction of what’s going on with this “disagreement” is a bunch of “technical wordcels” being annoyed at what they consider to be careless speaking that they take to be somewhat associated with careless thinking.
see e.g. Chapter 1 of Searle’s Seeing Things as They Are for an exposition of the view usually called direct realism (i’m pretty sure you guys (including the op) have something pretty different in mind than that view and i think it’s plausible you all would actually just agree with that view)
i’d be interested in hearing why you think that cultural/moral/technological/mathematical maturity is even possible or eventually likely (as opposed to one just being immature forever[1]) (assuming you indeed do think that)
As I write this list, I’ve a nagging feeling I’m missing some things.
my first two thoughts (i’m aware you arguably sorta already say sth like these things, but imo these deserve to be said more clearly):
oki! in this scenario, i guess i’m imagining humans/humanity becoming ever-more-artificial (like, ever-more-[human/mind]-made) and ever-more-intelligent (like, eventually much more capable than anything that might be created by 2100), so this still seems like a somewhat unnatural framing to me
I won’t address why [AIs that humans create] might[1] have their own alien values (so I won’t address the “turning against us” part of your comment), but on these AIs outcompeting humans[2]:
There is immense demand for creating systems which do ≈anything better than humans, because there is demand for all the economically useful things humans do — if someone were to create such a thing and be able to control it, they’d become obscenely rich (and probably come to control the world[3]).
Also, it’s possible to create systems that do ≈anything better than humans. In fact, it’s probably not that hard — it’ll probably happen at some point in this century by default (absent an AGI ban).
and imo probably will
sorry if this is already obvious to you, but I thought from your comment that there was a chance you hadn’t considered this
if moderately ahead of other developers and not shut down or taken over by others promptly