Program Coordinator of AI Safety Camp.
Great overview! I find this helpful.
Next to intrinsic optimisation daemons that arise through training internal to hardware, suggest adding extrinsic optimising “divergent ecosystems” that arise through deployment and gradual co-option of (phenotypic) functionality within the larger outside world.
AI Safety so far research has focussed more on internal code (particularly CS/ML researchers) computed deterministically (within known statespaces, as mathematicians like to represent). That is, rather than complex external feedback loops that are uncomputable – given Good Regulator Theorem limits and the inherent noise interference on signals propagating through the environment (as would be intuitive for some biologists and non-linear dynamics theorists).
So extrinsic optimisation is easier for researchers in our community to overlook. See this related paper by a physicist studying origins of life.
Unfortunately, perhaps due to the prior actions of others in your same social group, a deceptive frame of interpretation is more likely to be encountered first, effectively ‘inoculating’ everyone else in the group against an unbiased receipt of any further information.
Written in 2015. Still relevant.
Anchoring focalism and the Identifiable victim effect: Bias in Evaluating AGI X-Risks
Say maybe Illusion of Truth and Ambiguity Effect each are biasing how researchers in AI Safety evaluate one option below.
If you had to choose, which bias would more likely apply to which option?
A: Aligning AGI to be safe over the long term is possible in principle.
B: Long-term safe AGI is impossible fundamentally.
Illusion of truth effect and Ambiguity effect: Bias in Evaluating AGI X-Risks
Normalcy bias and Base rate neglect: Bias in Evaluating AGI X-Risks
Status quo bias; System justification: Bias in Evaluating AGI X-Risks
Belief Bias: Bias in Evaluating AGI X-Risks
Challenge to the notion that anything is (maybe) possible with AGI
Curse of knowledge and Naive realism: Bias in Evaluating AGI X-Risks
it needs to plug into the mathematical formalizations one would use to do the social science form of this.
Could you clarify what you mean with a “social science form” of a mathematical formalisation?
I’m not familiar with this.
they’re right to look at people funny even if they have the systems programming experience or what have you.
It was expected and understandable that people look funny at the writings from a multi-skilled researcher with new ideas that those people were not yet familiar with.
Let’s move on from first impressions.
If with simulation, we can refer to a model that is computed to estimate a factor on which further logical deduction steps are based on, that would connect up with Forrest’s work (it’s not really about multi-agent simulation though).
Based on what I learned from Forrest, we need to distinguish the ‘estimation’ factors from the ‘logical entailment’ factors. That the notion of “proof” is only with respect to that which can be logically entailed. Everything else is about assessment. In each case, we need to be sure we are doing the modelling correctly.
For example, it could be argued that step ‘b’ below is about logical entailment, though according to Forrest most would argue that it is an assessment. Given that it depends on both physics and logic (via comp-sci modeling), it depends on how one regards the notion of ‘observation’, and where that is empirical or analytic observation.
- b; If AGI/APS is permitted to continue to exist,
then it will inevitably, inexorably,
implement and manifest certain convergent behaviors.
- c; that among these inherent convergent behaviors
will be at least all of:.
− 1; to/towards self existence continuance promotion.
− 2; to/towards capability building capability,
a increase seeking capability,
a capability of seeking increase,
capability/power/influence increase, etc.
− 3; to/towards shifting
ambient environmental conditions/context
to/towards favoring the production of
(variants of, increases of)
its artificial substrate matrix.
Note again: the above is not formal reasoning. It is a super-short description of what two formal reasoning steps would cover.
Really appreciate you sharing your honest thoughts her, Rekrul.
From my side, I’d value actually discussing the reasoning forms and steps we already started to outline on the forum. For example, the relevance of intrinsic vs extrinsic selection and correction, or the relevance of the organic vs. artificial substrate distinction. These distinctions are something I would love to openly chat about with you (not the formal reasoning – I’m the bridge-builder, Forrest is the theorist).
That might feel unsatisfactory – in the sense of “why don’t you just give us the proof now?”
As far as I can tell (Forrest can correct me later), there are at least two key reasons:
There is a tendency amongst AI Safety researchers to want to cut to the chase to judging the believability of the conclusion itself. For example, notice that I tried to clarify several argument parts in comment exchanges with Paul, with little or no response. People tend to believe that this would be the same as judging a maths proof over idealised deterministic and countable spaces. Yet formal reasoning here would have have to reference and build up premises from physical theory in indeterministic settings. So we actually need to clarify how a different form of formal reasoning is required here, that does not look like what would be required for P=NP. Patience is needed on the side of our interlocutors.
While Forrest does have most of the argument parts formalised, his use of precise analytical language and premises are not going to be clear to you. Mathematicians are not the only people who use formal language and reasoning steps to prove impossibilities by contradiction. Some analytical philosophers do too (as do formal verification researchers in industrial software engineering using different notation for logic transformation, etc.). No amount of “just give the proof to us and leave it to us to judge” lends us confidence that the judging would track the reasoning steps – if those people already did not track correspondences of some first basic argument parts described by the explanatory writings by Forrest or I that their comments referred to. Even if they are an accomplished mathematician, they are not going to grasp the argumentation if they skim through the text, judging it based on their preconception of what language the terms should be described in or how the formal reasoning should be structured.
I get that people are busy, but this is how it is. We are actually putting a lot of effort and time into communication (and are very happy to get your feedback on that!). And to make this work, they (or others) will need to put in commensurate effort on their end. It is up to them to show that they are not making inconsistent jumps in reasoning there, or talking in terms of their intuitive probability predictions about the believability of the end result, where we should be talking about binary logic transformations.
And actually, such nitty-gritty conversations would be really helpful for us too! Here is what I wrote before in response to another person’s question whether a public proof is available:
Main bottleneck is (re)writing it in a language that AI(S) researchers will understand without having to do a lot of reading/digging in the definitions of terms and descriptions of axioms/premises. A safety impossibility theorem can be constructed from various forms that are either isomorphic with others or are using separate arguments (eg. different theoretical limits covering different scopes of AGI interaction) to arrive at what seems to be an overdetermined conclusion (that long-term AGI safety is not possible).
We don’t want to write it out so long that most/all readers drop out before they get to parse through the key reasoning steps. But we also do not want to make it so brief and dense that researchers are confused about at what level of generality we’re talking about, have to read through other referenced literature to understand definitions, etc.
Also, one person (a grant investigator) has warned us that AI safety researchers would be too motivated against the conclusion (see ‘belief bias’) that few would actually attempt to read through a formal safety impossibility theorem. That’s indeed likely based on my exchanges so far with AIS researchers (many of them past organisers or participants of AISC). So that is basically why we are first writing a condensed summary (for the Alignment Forum and beyond) that orders the main arguments for long-term AGI safety impossibility without precisely describing all axioms and definitions of terms used, covering all the reasoning gaps to ensure logical consistency, etc.
Note: Forrest has a background in analytical philosophy; he does not write in mathematical notation. Another grant investigator we called with had the expectation that the formal reasoning is necessarily written out in mathematical notation (a rough post-call write-up consolidating our impressions and responses to that conversation): https://mflb.com/ai_alignment_1/math_expectations_psr.html
Also note that Forrest’s formal reasoning work got funded by a $170K grant by Survival and Flourishing Fund. So some grant investigators were willing to bet on this work with money.
One thing Paul talks about constantly is how useful it would be if he had some hard evidence a current approach is doomed, as it would allow the community to pivot. A proof of alignment impossibility would probably make him ecstatic if it was correct (even if it puts us in quite a scary position).
I respect this take then by Paul a lot. This is how I also started to think about it a year ago.
BTW, I prefer you being blunt, so glad you’re doing that.
A little more effort to try to understand where we could be coming from would be appreciated. Particularly given what’s at stake here – a full extinction event.
Neither Forrest nor I have any motivation to post unsubstantiated claims. Forrest because frankly, he does not care one bit about being recognised by this community – he just wants to find individuals who actually care enough to consider the arguments rigorously. Me because all I’d be doing is putting my career at risk.
You can’t complain about people engaging with things other than your idea if the only thing they can even engage with is your idea.
The tricky thing here is that a few people are reacting by misinterpreting the basic form of the formal reasoning at the onset, and judging the merit of the work by their subjective social heuristics.
Which does not lend me (nor Forrest) confidence that those people would do a careful job at checking the term definitions and reasoning steps – particularly if written in precise analytic language that is unlike the mathematical notation they’re used to.
The filter goes both ways.
Instead you have decided to make this post and trigger more crank alarms.
Actually, this post was written in 2015 and I planned last week to reformat it and post it. Rereading it, I’m just surprised how well it appears to line up with the reactions.
The problem of a very poor signal to noise ratio from messages received from people outside of the established professional group basically means that the risk of discarding a good proposal from anyone regarded as an outsider is especially likely.
This insight feels relevant to a comment exchange I was in yesterday. An AI Safety insider (Christiano) lightly read an overview of work by an outsider (Landry). The insider then judged the work to be “crankery”, in effect acting as a protecting barrier against other insiders having to consider the new ideas.
The sticking point was the claim “It is 100% possible to know that X is 100% impossible”, where X is a perpetual motion machine or a ‘perpetual general benefit machine’ (ie. long-term safe and beneficial AGI).
The insider believed this was an exaggerated claim, which meant we first needed to clarify epistemics and social heuristics, rather than the substantive argument form. The reactions by the busy “expert” insider, who had elected to judge the formal reasoning, led to us losing trust that they would proceed in a patient and discerning manner.
There was simply not enough common background and shared conceptual language for the insider to accurately interpret the outsider’s writings (“very poor signal to noise ratio from messages received”).
Add to that:
“the tendency to believe that [long-term safe AGI is possible] because many other people do”
“that the facts are plain for all to see; that rational people will agree with us [that long-term safe AGI is possible]; and that those who do not are either uninformed, lazy, irrational, or biased.”
“Where the evaluation of the logical strength of an argument is biased by the believability of the conclusion [that long-term safe AGI is impossible]… The difficulty is that we want to apply our intuition too often, particularly because it is generally much faster/easier than actually doing/implementing analytic work.)… Arguments which produce results contrary to one’s own intuition about what “should” or “is expected” be the case are also implicitly viewed as somewhat disabling and invalidating of one’s own expertise, particularly if there also is some self-identification as an ‘expert’. No one wants to give up cherished notions regarding themselves. The net effect is that arguments perceived as ‘challenging’ will be challenged (criticized) somewhat more fully and aggressively than rationality and the methods of science would have already called for.”
“People do not want to be seen as having strong or ‘extreme opinions’, as this in itself becomes a signal from that person to the group that they are very likely to become ‘not a member’ due to their willingness to prefer the holding of an idea as a higher value than they would prefer being regarded as a member in good standing in the group. Extreme opinions [such as that it is 100% possible to know that long-term safe AGI is 100% impossible] are therefore to be regarded as a marker of ‘possible fanaticism’ and therefore of that person being in the ‘out crowd’.”
Status quo bias; System justification
“The tendency to like things to stay relatively the same. The tendency to defend and bolster the status quo [such as resolving to build long-term safe AGI, believing that it is a hard but solvable problem]. Existing social, economic, and political arrangements tend to be preferred, and alternatives disparaged sometimes even at the expense of individual and collective self-interest.”
“The degree to which these various bias effects occur is generally in proportion to a motivating force, typically whenever there is significant money, power, or prestige involved. Naturally, doing what someone ‘tells you to do’ [like accepting the advice to not cut to the chase and instead spend the time to dig into and clarify the arguments with us, given the inferential distance] is a signal of ‘low status’ and is therefore to be avoided whenever possible, even if it is a good idea.”
I mean, someone recognised as an expert in AI Safety could consciously mean well trying to judge an outsider’s work accurately – in the time they have. But that’s a lot of biases to counteract.
Forrest actually clarified the claim further to me by message:
Re “100%” or “fully knowable”:
By this, I usually mean that the analytic part of an argument is fully finite and discrete, and that all parts (statements) are there, the transforms are enumerated, known to be correct etc (ie, is valid).
In regards to the soundness aspect, that there is some sort of “finality” or “completeness” in the definitions, such that I do not expect that they would ever need to be revised (ie, is at once addressing all necessary aspects, sufficiently, and comprehensively), and that the observations are fully structured by the definitions, etc. Usually this only works for fairly low level concepts, things that track fairly closely to the theory of epistemology itself—ie, matters of physics that involve symmetry or continuity directly (comparison) or are expressed purely in terms of causation, etc.
One good way to test the overall notion is that something is “fully 100% knowable” if one can convert it to a computer program, and the program compiles and works correctly. The deterministic logic of computers cannot be fooled, as people sometimes can, as there is no bias. This is may be regarded by some as a somewhat high standard, but it makes sense of me as it is of the appropriate type: Ie, a discrete finite result being tested in a purely discrete finite environment. Hence, nothing missing can hide.
But the point is – few readers will seriously consider this message.
That’s my experience, sadly.
The common reaction I noticed too from talking with others in AI Safety is that they immediately devaluated that extreme-sounding conclusion that is based on the research of an outsider. A conclusion that goes against their prior beliefs, and against their role in the community.
Reactive devaluation: Bias in Evaluating AGI X-Risks
Your remarks make complete sense.
Forest mentioned that for most people, reading his precise “EGS” format will be unparsable unless one has had practice with it. Also agreed that there is no background or context. The “ABSTract” is really too often too brief a note, usually just a reminder what the overall idea is. And the text itself IS internal notes, as you have said.
He says that it is a good reminder that he should remember to convert “EGS” to normal prose before publishing. He does not always have the energy or time or enthusiasm to do it. Often it requires a lot of expansion too – ie, some writing has to expand to 5 times their “EGS” size.
I’ll also work on this! There’s a lot of content to share, but will try and format and rephrase to be better followable for readers on LessWrong.
It’s worth noting up front that this sounds pretty crazy…
So this is looking pretty cranky right from the top, and hopefully you can sympathize with someone who has that reaction.
I get that this comes across as a strong claim, because it is.
So I do not expect you to buy that claim in one go (it took me months of probing the premises and the logic of Forrest’s arguments). It’s reasonable and epistemically healthy to be curiously skeptical at the onset, and try to both gain new insights from the writing and probe for inconsistencies.
Though I must say I’m disappointed that based on your light reading, you dismiss Forrest’s writings (specifically, the few pages you read) as crankery. Let me get back on that point.
“It is 100% possible to know that X is 100% impossible” would be an exaggerated claim… even if X was “perpetual motion machines”.
Excerpting from Forrest’s general response:
For one thing, it is not just the second law of thermodynamics that “prohibits” (ie, ‘makes impossible’) perpetual motion machines – it is actually the notion of “conservation law” – ie, that there is a conservation of matter and energy, and that the sum total of both together, in any closed/contained system, can neither be created nor destroyed. This is actually a much stronger basis on which to argue, insofar as it is directly an instance of an even more general class of concept, ie, one of symmetry.
All of physics – even the notion of lawfulness itself – is described in terms of symmetry concepts. This is not news, it is already known to most of the most advanced theoretical working physicists.
Basically, what [Paul] suggests is that anything that asserts or accepts the law of the conservation of matter and energy, and/or makes any assertion based strictly on only and exactly such conservation law, would be a categorical example of “an exaggerated claim”, and that therefore he is suggesting that we, following his advise, should regard conservation law – and thus actually the notion of symmetry, and therefore also the notion of ‘consistent truth’ (ie, logic, etc) as an “insufficient basis” of proof and/or knowing.
This is, of course, too high a standard, insofar as, once one is rejecting of symmetry, there is no actual basis of knowing at all, of any kind at all, beyond such a rejection – there is simply no deeper basis for the concept of truth that is not actually about truth. That leaves everyone reading his post implicitly with him being the ‘arbiter’ of what counts as “proof”. Ie, he has explicitly declared that he rejects the truth of the statement that it is “100% possible to know...”, (via the laws of conservation of matter and energy, as itself based on only the logic of symmetry, which is also the basis of any notion of ‘knowing’), ”...that real perpetual motion machines are 100% impossible” to build, via any engineering technique at all, in the actual physical universe.
The reason that this is important is that the same notion – symmetry – is also the very most essential essence of what it means to have any consistent idea of logical truth. Ie, every transition in every math proof is a statement in the form “if X is true, then by known method Y, we can also know that Z is true”. Ie, every allowed derivation method (ie, the entire class (set ‘S’) of accepted/agreed Y methods allowable for proof) is effectively a kind of symmetry – it is a ‘truth preserving transformation’, just like a mirror or reflection is a ‘shape preserving transformation’. Ie, for every allowable transformation, there is also an allowed inverse transformation, so that “If Z is true, then via method inverse Y, we can also know that X is true”. This sort of symmetry is the essence of what is meant by ‘consistent’ mathematical system.
It is largely because of this common concept – symmetry – that is the reason that both math and physics work so well together.
Yet we can easily notice that anything that is a potential outcome of “perpetual general benefit machines” (ie. AGI) results in all manner of exaggerated claims.
Turning to my response:
Perhaps by your way of defining the statement “100% possible to know” is not only that a boolean truth is consistently knowable within a model premised on 100% repeatedly empirically verified (ie. never once known to be falsified by observation) physical or computational theory?
Rather, perhaps the claim “100% possible to know” would in your view require additionally the unattainable completeness of past and future observation-based falsification of hypotheses (Solomonoff induction in a time machine)? Of course, we can theorise about how you model this.
I would ask: how then given that we do not and cannot have “Solomonoff induction in a time machine” can we soundly establish any degree of probability of knowing? To me, this seems like theorising about the extent to which idealised Bayesian updating would change our minds without our minds having access to the idealised Bayesian updating mechanism.
So to go back on your analogy, how would we soundly prove, by contradiction, that a perpetual motion machine is impossible?
My understanding is that you need more than consistent logic to model that. The formal model needs to be grounded in empirically sound premises about how the physical world works – in this case, the second law of thermodynamics based on the even more fundamental law of conservation of matter and energy.
You can question the axioms of the model – maybe if we collected more observations, the second law of thermodynamics turns out not to be true in some cases? Practically, that’s not a relevant question, because all we’ve got to go on is the observations we’ve got until now. In theory, this question of receiving more observations is not relevant to whether you can prove (100% soundly know) within the model that the machine cannot (is 100% impossible) work into perpetuity – yes, you can.
Similarly, take the proposition of an artificial generally-capable machine (“AGI”) working in “alignment with” continued human existence into perpetuity. How would you prove that proposition to be impossible, by contradiction?
To prove based on sound axioms that the probability of AGI causing outcomes out of line with a/any condition needed for the continued existence of organic life converges on 100% (in theory over infinity time; in practice actually over decades or centuries), you would need to ground the theorem in how the physical world works.
I imagine you reacting skeptically here, perhaps writing back that there might be future observations that contradict the conclusions (like everyone not dying) or updates to model premises (like falsification of information signalling underlying physics theory) with which we would end up falsifying the axioms of this model.
By this use of the term “100% possible to know” though, I guess it is also not 100% possible to know that 2 + 2 = 5 is 100% impossible as a result?
Maybe we’re wrong about axioms of mathematics? Maybe at some point mathematicians falsify one of the axioms as not soundly describing how truth content is preserved through transformations? Maybe you actually have not seen anyone yet write out the formal reasoning steps (ie. you cannot tell yet if the reasoning is consistent) for deriving 2 + 2 = 4 ? Maybe you misremember the precise computational operations you or other mathematicians performed before and/or the result derived, leading you to incorrectly conclude that 2 + 2 = 4?
I’m okay with this interpretation or defined use of the statement “100% possible to know”. But I don’t think we can do much regarding knowing the logic truth values of hypothetical outside-of-any-consistent-model possibilities, except discuss them philosophically.
That interpretation cuts both ways btw. Clearly then, it is by far not 100% possible to know whether any specific method(s) would maintain the alignment of generally-capable self-learning/modifying machinery existing and operating over the long term (millennia+) such not to cause the total extinction of humans.
To be willing to build that machinery, or in any way lend public credibility or resources to research groups building that machinery, you’d have to be pretty close to validly and soundly knowing that it is 100% possible that the machinery will stay existentially safe to humans.
Basically, for all causal interactions the changing machinery has with the changing world over time, you would need to prove (or guarantee above some statistical threshold) that the consequent (final states of the world) “humans continue to exist” can be derived as a near-certain possibility from the antecedent (initial states of the world).
Or inversely, you can do the information-theoretically actually much easier thing of proving that while many different possible final states of the world could result from the initial state of the world, the one state of the world excluded from all possible states as a possibility is “humans continue to exist.”
Morally, we need to apply the principle of precaution here – it is much easier for new large-scale technology to destroy the needed physical complexity for humans to live purposeful and valued lives than to support a meaningful increase in that complexity.
By that principle, the burden of proof – for that the methods you publicly communicate could or would actually maintain alignment of the generally-capable machinery – is on you.
You wrote the following before in explaining your research methodology:
“But it feels to me like it should be possible to avoid egregious misalignment regardless of how the empirical facts shake out — it should be possible to get a model we build to do at least roughly what we want.”
To put it frankly: does the fact that you write “it feels like” let you off the hook here?
Ie. since you were epistemically humble enough to not write that you had any basis to make that claim (you just expressed that it felt like this strong claim was true), you have a social license to keep developing AGI safety methods in line with that claim?
Does the fact that Forrest does write that he has a basis for making the claim – after 15 years of research and hundreds of dense explanatory pages (to try bridge the inferential gap to people like you) – that long-term safe AGI is 100% impossible, mean he is not epistemically humble enough to be taken seriously?
Perhaps Forrest could instead write “it feels like that we cannot build an AGI model to do and keep doing roughly what we want over the long term”. Perhaps then AI Safety researchers would have resonated with his claim and taken it as true at face value? Perhaps they’d be motivated to read his other writings?
No, the social reality is that you can claim “it feels that making the model/AGI work roughly like we want is possible” in this community, and readers will take it at face value as prima facie true.
Forrest and I have claimed – trying out various pedagogical angles and ways of wording – that “it is impossible to have AGI work roughly as we want over the long term” (not causing the death of all humans for starters). So far, of the dozens of AI safety people who had one-on-one exchanges with us, most of our interlocutors, reacted skeptically immediately and then came up with all sorts of reasons not to continue reading/considering Forrest’s arguments. Which is exactly why I put up this post about “presumptive listening” to begin with.
You have all of the community’s motivated reasoning behind you, which puts you in the socially safe position of not being pressed any time soon by more than a few others in the community to provide a rigorous basis for your “possibility” claim.
Slider’s remark on that your commentary seems to involve an isolated demand for rigour resonated for me. The phrase in my mind was “double standards”. I’m glad someone else was willing to bring this point up to a well-regarded researcher, before I had to.
It’s clear that AI systems can change their environment in complicated ways and so analyzing the long-term outcome of any real-world decision is hard. But that applies just as well to having a kid as to building an AI, and yet I think there are ways to have a kid that are socially acceptable. I don’t think this article is laying out the kind of steps that would distinguish building an AI from having a kid.
I will clarify a key distinction between building an AGI (ie. not just any AI) and having a kid:
One thing about how the physical world works, is that in order for code to be computed, this needs to take place through a physical substrate. This is a necessary condition – inputs do not get processed into outputs through a platonic realm.
Substrate configurations in this case are, by definition, artificial – as in artificial general intelligence. This as distinct from the organic substrate configurations of humans (including human kids).
Further, the ranges of conditions needed for the artificial substate configurations to continue to exist, function and scale up over time – such as extreme temperatures, low oxygen and water, and toxic chemicals – fall outside the ranges of conditions that humans and other current organic lifeforms need to survive.
Hope that clarifies a long-term-human-safety-relevant distinction between building AGI (that continues to scale) and having a kid (who grows up to adult size).
I ended up convinced that this isn’t about EA community blindspots, the entire scientific community would probably consider this writing to be crankery.
Paul, you read one overview essay where Forrest briefly outlined how his proof method works in an analogy to theory that a mathematician like you already knows about and understands the machinery of (Galois’ theory). Then, as far as I can tell, you concluded that since Forrest did not provide the explicit proof (that you expected to find in that essay) and since the conclusion (as you interpret it) seemed unbelievable, that the “entire” scientific community would (according to you) probably consider his writing crankery.
By that way of “discerning” new work, if Kurt Gödel would have written an outline for researchers in the field to understand his unusual methodology, with the concise conclusion “it is 100% knowable that it is 100% impossible for a formal axiomatic system to be both consistent and complete” a well-known researcher in the (Hilbert’s) field would have read that and concluded that they had not immediately given them a proof yet and that the conclusion was unbelievable (such a strong statement!) therefore Gödel was probably a crank and should be denounced publicly in the forum as such.
Your judgement seems based on first impressions and social heuristics. On one hand you admit this, and on the other hand you seem to have no qualms with dismissing Forrest’s reasoning a priori.
In effect, you are acting as a gatekeeper – “protecting” others in the community from having to be exposed and meaningfully engage with new ideas. This is detrimental to research on the frontiers that falls outside of already commonly-accepted paradigms (particularly paradigms of this community).
The red flag for us was when you treated ‘proof’ as probable opinion based on your personal speculative observation, as proxy, rather than as a finite boolean notion of truth based on valid and sound modeling of verified known world states.
Note by Forrest on this:
I notice also that Gödel’s work, if presented for the first time today, would not be counted by him as “a very clear argument”. The Gödel proof, as given then, was actually rather difficult and not at all obvious. Gödel had to construct an entire new language and self reference methodology for the proof to even work. The inferential distance for Gödel was actually rather large, and the patience needed to understand his methods, which were not at all common at the time, would not have passed the “sniff test” being applied by this person here, in the modern era, where the expectation is that everything can be understood on a single pass reading one post on some forum somewhere while on the way to some other meeting. Modern social media simply does not work well for works of these types. So the Gödel work, and the Bell Theorem, and/or anything else similarly both difficult and important, simply would not get reviewed by most people in today’s world.”
Noting that your writing in response also acts as a usual filter to us. It does not show willingness yet to check the actual form or substance of Forrest’s arguments. This distinguishes someone who is not available to reason with (they probably have no time, patience, or maybe no actual interest) but who nonetheless seems motivated to signal they have an opinion to their ingroup as pertaining to the outgroup.
The claim that ‘nothing is knowable for sure’ and ‘believe in the possibilities’ (all of that maybe good for humanity) is part of the hype cycle. It ends up being marketing. So the crank accusation ends up being the filter of who believes the marketing, and who does not – who is in the in-crowd and who are ‘the outsiders’.
Basically, he accepts that a violation of symmetry would be (should be) permissible—hence allowing maybe at least some slight possibility that some especially creative genius type engineering type person might someday eventually actually make a working perpetual motion machine, in the real universe. Of course, every crank wants to have such a hope and a dream—the hype factor is enormous – “free energy!” and “unlimited power!!” and “no environmental repercussions” – utopia can be ours!!!.
You only need to believe in the possibility, and reject the notion of 100% certainty. Such a small cost to pay. Surely we can all admit that sometimes logic people are occasionally wrong?
The irony of all of this is that the very notion of “crank” is someone who wants dignity and belonging so badly that they will easily and obviously reject logic (ie, symmetry), such that their ‘topic arguments’ have no actual merit. Moreover, given that a ‘proof’ is something that depends on every single transformation statement actually being correct, even a single clear rejection of their willingness to adhere to sensible logic is effectively a clear signal that all other arguments (and communications) by that person – now correctly identified as the crank – are to be rejected, as their communications is/are actually about social signaling (a kind of narcissism or feeling of rejection – the very essence of being a crank) rather than about truth. Hence, once someone has made even one single statement which is obviously a rejection of a known truth, ie, that they do not actually care about the truth of their arguments, then everything they say is/are to be ignored by everyone else thereafter.
And yet the person making the claim that my work is (probably) crankery, has actually done exactly that, engage in crankery, by their own process. He has declared that he rejects the truth of the statement (and moreover has very strongly suggested that everyone else should also reject the idea) that it is 100% possible to know, via the laws of conservation of matter and energy, (as itself based on only the logic of symmetry, which is also the basis of the notion of ‘knowing’), that real perpetual motion machines are 100% impossible to build, via any engineering technique at all, in the actual physical universe.
In ancient times, a big part of the reason for spicy foods was to reject parasites in the digestive system. In places where sanitary conditions are difficult (warmer climates encourage food spoilage), spicy foods tend to be more culturally common. Similar phenomena occur can occur in communication – ‘reading’ and ‘understanding’ as a kind of mental digestive process – via the use of ‘spicy language’. The spice I used was the phrase “It is 100% possible to know that X is 100% impossible”. It was put there by design – I knew and expected it would very likely trigger some types of people, and thus help me to identify at least a few of the people who engage in social signaling over rigorous reasoning – even if they are also the ones making the same accusation of others. The filter goes both ways.
So that leaves your last point, about self-awareness:
I think this shows a lack of self awareness. Right now the state of play is more like the author of this document arguing that “everyone else is wrong,” not someone who is working on AI safety.
Forrest is not an identifiable actively contributing member to “AI safety” (and also therefore not part of our ingroup).
Thus, Forrest pointing out that there are various historical cases where young men kept trying to solve impossible problems — for decades, if not millennia — all the while claiming those problems must be possible to solve after all through some method, apparently says something about Forrest and nothing at all about there being a plausible analogy with AGI Safety research…?
- 30 Dec 2022 10:30 UTC; 1 point)'s comment on Reactive devaluation: Bias in Evaluating AGI X-Risks by (
- 30 Dec 2022 10:34 UTC; 0 points)'s comment on Reactive devaluation: Bias in Evaluating AGI X-Risks by (EA Forum;
Good to know, thank you. I think I’ll just ditch the “separate claims/arguments into lines” effort.
Forrest also just wrote me: “In regards to the line formatting, I am thinking we can, and maybe should (?) convert to simple conventional wrapping mode? I am wondering if the phrase breaks are more trouble then they are worth, when presenting in more conventional contexts like LW, AF, etc. It feels too weird to me, given the already high weirdness level I cannot help but carry.”
The premise that “infinite value” is possible, is an assumption.
This seems a bit like the presumption that “divide by zero” is possible. Assigning a probability to the possibility that divide by zero results in a value doesn’t make sense, I think, because the logical rules themselves rules this out.
However, if I look at this together with your earlier post (http://web.archive.org/web/20230317162246/https://www.lesswrong.com/posts/dPCpHZmGzc9abvAdi/orthogonality-thesis-is-wrong): I think I get where you’re coming from in that if the agent can conceptualise that (many) (extreme) high-value states are possible where those values are not yet known to it, yet still plans for those value possibilities in some kind of “RL discovery process”, then internal state-value optimisation converges on power-seeking behaviour — as optimal for reaching the expected value of such states in the future (this further assumes that the agent’s prior distribution lines up – eg. assumes unknown positive values are possible, does not have a prior distribution that is hugely negatively skewed over negative rewards).
I think initially specifying premises such as these more precisely initially ensures the reasoning from there is consistent/valid. The above would not apply to any agent, nor even to any “AGI” (a fuzzy term; I would define it more specifically as “fully-autonomous, cross-domain-optimising, artificial machinery”