Added clarification. (This seems to be a quite general problem of mismatched discourse expectations, where a commenter presumes a different shared-presumption of how much the comment is context-specific, or other things like that, compared to a reader.)
TsviBT
(That’s a reasonable interest but not wanting to take that time is part of my not wanting to give an overall opinion about the specific situation with Palisade; I don’t have an opinion about Palisade, and my comment just means to discuss general principles.)
I don’t want to give an overall opinion [ETA: because I haven’t looked into Palisade specifically at all, and am not necessarily wanting to invest the requisite effort into having a long-term worked-out opinion on the whole topic], but some considerations:
You can just say what was actually shown? And build a true case for scariness from the things that were actually shown?
You can’t show “XYZ scary thing is what happens when you have a true smarter than human mind” right now, because those don’t exist. But you can show “trained systems are able to break human-design security systems in surprising ways” or “trained systems can hide thoughts” or “trained systems can send hidden messages” or “big searches find powerful solutions that humans didn’t think of and that have larger effects in the domain than what human designers got” etc.
Sometimes the reason your “experiment” is good at communicating is that you are correctly breaking someone’s innocent hopeful expectation that “AIs are tools designed by humans so they’ll just do what’s useful”. But if you oversell it as “AIs will convergently instrumentally do XYZ scary thing”, this is not necessary, because that isn’t actually that relevant to why it was important for communication, and it gets much of the badness of lying, because it’s not really true.
It’s just deeply bad to deceive, ethically / ruleswise, and morally / consequences-wise, and virtuously / self-construction-wise.
There’s a bunch of bad consequences to deceiving. This really shouldn’t have to be belabored all the time, we should just know this by now, but for example:
Later someone might patch the fake signals of misalignment. Then they can say “look we solved alignment”. And you’ll get accused of goalpost moving.
There will be more stronger counterarguments / sociopolitical pressure, if serious regulation is on the table. Then weaker arguments are more likely to be exposed.
You communicate incorrectly about what it is that’s dangerous ---> the regulation addresses the wrong thing and doesn’t know how to notice to try to address the right thing.
If exposed, you’ve muddied the waters in general / made the market for strategic outlooks/understanding on AI be a market for lemons.
(I’m being lazy and not checking that I actually have an example of the following, and often that means the thing isn’t real, but: [<--RFW]) Maybe the thing is to use examples for crossing referential but not evidential/inferential distances.
That is, sometimes I’m trying to argue something to someone, and they don’t get some conceptual element X of my argument. And then we discuss X, and they come up with an example of X that I think isn’t really an example of X. But to them, their example is good, and it clicks for them what I mean, and then they agree with my argument or at least get the argument and can debate about it more directly. I might point out that I disagree with the example, but not necessarily, and move on to the actual discussion. I’ll keep in mind the difference, because it might show up again—e.g. it might turn out we still weren’t talking about the same thing, or had some other underlying disagreement.
I’m unsure whether this is bad / deceptive, but I don’t think so, because it looks/feels like “just what happens when you’re trying to communicate and there’s a bunch of referential distance”.
For example, talking about instrumental convergence and consequentialist reasoning is confusing / abstract / ungrounded to many people, even with hypothetical concrete examples; actual concrete examples are helpful. Think of people begging you to give any example of “AI takeover” no matter how many times you say “Well IDK specifically how they’d do it / what methods would work”. You don’t have to be claiming that superviruses are somehow a representative / mainline takeover method, to use that as an example to communicate what general kind of thing you’re talking about. Though it does have issues...
(This is related to the first bullet point in this comment.)
So I don’t think it’s necessarily bad/deceptive to be communicating by looking for examples that make things click in other people. But I do think it’s very fraught for the above reasons. Looking for clicks is good if it’s crossing referential distance, but not if it’s looking for crossing evidential/inferential distance.
You don’t need to be smarter in every possible way to get radically increase in speed to solve illnesses.
You need the scientific and technological creativity part, and the rest would probably flow, is my guess.
I think part of the motive of making AGI is to solve all illnesses for everyone and not just people who aren’t yet born.
What I mean is that giving humanity more brainpower also gets these benefits. See https://tsvibt.blogspot.com/2025/11/hia-and-x-risk-part-1-why-it-helps.html It may take longer than AGI, but also it doesn’t pose a (huge) risk of killing everyone.
Does this basically mean not believing in AGI happening between the next two decades?
It means not being very confident that AGI happens within two decades, yeah. Cf. https://www.lesswrong.com/posts/sTDfraZab47KiRMmT/views-on-when-agi-comes-and-on-strategy-to-reduce and https://www.lesswrong.com/posts/5tqFT3bcTekvico4d/do-confident-short-timelines-make-sense
Aren’t we talking mostly about diseases that come with age
Yes.
Someone could do a research project to guesstimate the impact more precisely. As one touchpoint, here’s 2021 US causes of death, per the CDC:
(From https://wisqars.cdc.gov/pdfs/leading-causes-of-death-by-age-group_2021_508.pdf )
Total deaths of young people in the US is small, in relative terms, so there’s not much room for impact. There would still be some impact; we can’t tell from this graph of course, but many of the diseases listed could probably be quite substantially derisked (cardio, neoplasms, respiratory).
This is only deaths, so there’s more impact if you include non-lethal cases of illness. IDK how much of this you can impact with reprogenetics, especially since uptake would take a long time.
where we will have radically different medical capabilities if AGI happens in the next two decades?
Well, on my view, if actual AGI (general intelligence that’s smarter than humans in every way including deep things like scientific and technological creativity) happens, we’re quite likely to all die very soon after. But yeah, if you don’t think that, then on your view AGI would plausibly obsolete any current scientific work including reprogenetics, IDK.
Another thing to point out is that, if this is a motive for making AGI, then reprogenetics could (legitimately!) demotivate AGI capabilities research, which would decrease X-risk.
I say “gippity” meaning “generative pretrained transformer” which IIUC is still true and descriptive for most of this, except Mamba.
It’s just saying that
There’s more and less general categories. E.g. “Sunny day” is more general than “Sunny and cool” because if a day is S&C then it’s also C, but there’s also days that are Sunny but not Cool.
Often, if you take two categories, neither one is strictly more general than the other one. E.g. “Sunny and cool” and “Cool and buggy”. There are days that are S&C but not C&B; and there are also days that are C&B but not S&C.
You can take unions and intersections. The intersection of “Sunny and cool” and “Cool and buggy” is “Sunny and cool and buggy”. Intersections give more specific (less abstract) categories; they add more constraints, so fewer possible worlds satisfy all those constraints, so you’re talking about some more specific category of possible worlds. The union of “Sunny and cool” and “Cool and buggy” is “Cool; and also, sunny or buggy or both”. Unions give less specific (more abstract) categories, because they include all of the possible worlds from either of the two categories.
If you want to get more specific, you want to start talking about a smaller category. So you want to go downward (i.e. to a smaller set, included inside the bigger set) in the lattice. But there’s multiple ways to do that. E.g. to be more specific than “Cool; and also, sunny or buggy or both”, you could talk about “Sunny and cool”, or you could talk about “Cool and buggy”.
(This is far from everything that “abstract”, “specific”, “category”, and “concept” actually mean, but it’s something.)
Since it’s slower, the tech development cycle is faster in comparison. Tech development --> less expensive tech --> more access --> less concetration of power --> more moral outcomes.
Grain of Truth (Reflective Oracles). Understanding an opponent perfectly requires greater intelligence or something in common.
And understanding yourself. Of course, you have plenty in common with yourself. But, you don’t have everything in common with yourself, if you’re growing.
Dovetailing. Every meta-cognition enthusiast reinvents Levin/Hutter search, usually with added epicycles.
To frame it in a very different way, learning math and generally gaining lots of abstractions and getting good wieldy names for them is super important for thinking. Doing so increases your “algorithmic range”, within your very constrained cognition.
Chaitin’s Number of Wisdom. Knowledge looks like noise from outside.
To a large extent, but not quite exactly (which you probably weren’t trying to say), because of “thinking longer should make you less surprised”. From outside, a big chunk of alien knowledge looks like noise (for now), true. But there’s a “thick interface” where just seeing stuff from the alien knowledgebase will “make things click into place” (i.e. will make you think a bit more / make you have new hypotheses (and hypothesis bits)). You can tell that the alien knowledgebase is talking about Things even if you aren’t very familiar with those Things.
Lower Semicomputability of M. Thinking longer should make you less surprised.
I’d go even farther and say that in “most” situations in real life, if you feel like you want to think about X more, then the top priority (do it first, and do it often ongoingly) is to think of more hypotheses.
A basic issue with a lot of deliberate philanthropy is the tension between:
In many domains, much of the biggest gains are likely to come from marginal opportunities. E.g. because they have more value of information, more large upsides, more addressing neglected areas (and therefore plausibly strategically important.
Marginal opportunities are harder to evaluate.
There’s less preexisting understanding, on the part of fund allocators.
The people applying would tend to be less tested.
Therefore, it’s easier to game.
The kneejerk solution I’d propose is “proof of novel work”. If you want funding to do X, you should show that you’ve done something to address X that others haven’t done. That could be a detailed insightful write-up (which indicates serious thinking / fact-finding); that could be some you did on the side, which isn’t necessarily conceptually novel but is useful work on X that others were not doing; etc.
I assume that this is an obvious / not new idea, so I’m curious where it doesn’t work. Also curious what else has been tried. (E.g. many organizations do “don’t apply, we only give to {our friends, people we find through our own searches, people who are already getting funding, …}”.)
In this example, you’re trying to make various planning decisions; those planning decisions call on predictions; and the predictions are about (other) planning decisions; and these form a loopy network. This is plausibly an intrinsic / essential problem for intelligences, because it involves the intelligence making predictions about its own actions—and those actions are currently under consideration—and those actions kinda depend on those same predictions. The difficulty of predicting “what will I do” grows in tandem with the intelligence, so any sort of problem that makes a call to the whole intelligence might unavoidably make it hard to separate predictions from decisions.
A further wrinkle / another example is that a question like “what should I think about (in particular, what to gather information about / update about)”, during the design process, wants these predictions. For example, I run into problems like:
I’m doing some project X.
I could do a more ambitious version of X, or a less ambitious version of X.
If I’m doing the more ambitious version of X, I want to work on pretty different stuff right now, at the beginning, compared to if I’m doing the less ambitious version. Example 1: a programming project; should I put in the work ASAP to redo the basic ontology (datatypes, architecture), or should I just try to iterate a bit on the MVP and add epicycles? Example 2: an investigatory blog post; should I put in a bunch of work to get a deeper grounding in the domain I’m talking about, or should I just learn enough to check that the specific point I’m making probably makes sense?
The question of whether to do ambitious X vs. non-ambitious X also depends on / gets updated by those computations that I’m considering how to prioritize.
Another kind of example is common knowledge. What people actually do seems to be some sort of “conjecture / leap of faith”, where at some point they kinda just assume / act-as-though there is common knowledge. Even in theory, how is this supposed to work, for agents of comparable complexity* to each other? Notably, Lobian handshake stuff doesn’t AFAICT especially look like it has predictions / decisions separated out.
*(Not sure what complexity should mean in this context.)
We almost certainly want to eventually do uploading, if nothing else because that’s probably how you avoid involuntary-preheatdeath-death. It might be the best way to do supra-genomic HIA, but I would rather leave that up to the next generation, because it seems both morally fraught and technically difficult. It’s far from clear to me that we ever want to make ASI; why ever do that rather than just have more human/humane personal growth and descendants? (I agree with the urgency of all the mundane horrible stuff that’s always happening; but my guess is we can get out of that stuff with HIA before it’s safe to make ASI. Alignment is harder than curing world hunger and stopping all war, probably (glib genie jokes aside).)
Mind uploading is probably quite hard. See here. It’s probably much easier to get AGI from partial understanding of how to do uploads, than to get actual uploads. Even if you have unlimited political capital, such that you can successfully prevent making partial-upload-AGIs, it’s probably just very technically difficult. Intelligence amplification is much more doable because we can copy a bunch of nature’s work by looking at all the existing genetic variants and their associated phenotypes.
I assume this got stuck / waysided; do you know why?
I think of most of these things as bringing up the floor rather than raising the ceiling.
Ohhhh ok. That’s helpful, thanks.
(I think I may have asked you a similar question before, sorry if I forgot your answer:) Are there a couple compelling examples of someone who
did something you’d identify as roughly this procedure;
then did something I’d consider impressive (like a science or tech or philosophy or political advance);
and attributed 2 to 1?
Vaguely curious how this relates to Universal Inductors.