davidad

Karma: 2,049

Programme Director at UK Advanced Research + Invention Agency focusing on safe transformative AI; formerly Protocol Labs, FHI/Oxford, Harvard Biophysics, MIT Mathematics And Computation.

davidad 10 Dec 2021 9:11 UTC
LW: 57 AF: 22
AF
on: Biology-Inspired AGI Timelines: The Trick That Never Works
Heartened by a strong-upvote for my attempt at condensing Eliezer’s object-level claim about timeline estimates, I shall now attempt condensing Eliezer’s meta-level “core thing”.
1. Certain epistemic approaches to arrive at object-level knowledge consistently look like a good source of grounding in reality, especially to people who are trying to be careful about epistemics, and yet such approaches’ grounding in reality is consistently illusory.
2. Specific examples mentioned in the post are “the outside view”, “reference class forecasting”, “maximum entropy”, “the median of what I remember credible people saying”, and, most importantly for the object-level but least importantly for the Core Thing, “Drake-equation-style approaches that cleanly represent the unknown of interest as a deterministic function of simpler-seeming variables”.
3. Specific non-examples are concrete experimental observations (falling objects, celestial motion). These have grounding in reality, but they don’t tend to feel like they’re “objective” in the same way, like a nexus that my beliefs are epistemically obligated to move toward—they just feel like a part of my map that isn’t confused. (If experiments do start to feel “objective”, one is then liable to mistake empirical frequencies for probabilities.)
4. The illusion of grounding in reality doesn’t have to be absolute (i.e. “this method reliably arrives at optimal beliefs”) to be absolutely corrupting (e.g. “this prior, which is not relative to anything in particular, is uniquely privileged and central, even if the optimal beliefs are somewhere nearby-but-different”).
5. The “trick that never works”, in general form, is to go looking in epistemology-space for some grounding in objective reality, which will systematically tend to lead you into these illusory traps.
6. Instead of trying to repress your subjective ignorance by shackling it to something objective, you should:
  1. sublimate your subjective ignorance into quantitative probability measures,
  2. use those to make predictions and design experiments, and finally
  3. freely and openly absorb observations into your subjective mind and make subjective updates.
Eliezer doesn’t seem to be saying the following [edit: at least not in my reading of this specific post], but I would like to add:
1. Even just trying to make your updates objective (e.g. by using a computer to perform an exact Bayesian update) tends to go subtly wrong, because it can encourage you to replace your actual map with your map of your map, which is predictably less informative. Making a map of your map is another one of those techniques that seem to provide more grounding but do not actually.
2. Calibration training is useful because your actual map is also predictably systematically bad at updating by default, and calibration training makes it better at doing this. Teaching the low-description-length principles of probability to your actual map-updating system is much more feasible (or at least more cost-effective) than emitting your actual map into a computationally realizable statistical model.
3. Techniques that give the illusion of objectivity are usually not useless. But to use them effectively, you have to see through the illusion of objectivity, and treat their outputs as observations of what those techniques output, rather than as glimpses at the light of objective reasonableness.
  - In the particular example of forecasting AGI with biological anchors, Eliezer does this when he predicts (correctly, at least in the fictional dialogue) that the technique (perhaps especially when operated by people who are trying to be careful and objective) will output a median 30 years from the present.
    It’s only because Eliezer can predict the outcome, and finds it almost uncorrelated in his map from AGI’s actual arrival, that he dismisses the estimate as useless.
    This particular example as a vehicle for the Core Thing (if I’m right about what that is) has the advantage that the illusion of objectivity is especially illusory (at least from Eliezer’s perspective), but the disadvantage that one can almost read Eliezer as condemning ever using Drake-equation-style approaches, or reference-class forecasting, or the principle of maximum entropy. But I think the general point is about how to undistortedly view the role of these kinds of things in one’s epistemic journey, which in most cases doesn’t actually exclude using them.
What links here?
- Reply to Eliezer on Biological Anchors by HoldenKarnofsky (23 Dec 2021 16:15 UTC; 150 points)

davidad 9 Dec 2021 18:16 UTC
LW: 53 AF: 25
AF
on: davidad’s Shortform
I want to go a bit deep here on “maximum entropy” and misunderstandings thereof by the straw-man Humbali character, mostly to clarify things for myself, but also in the hopes that others might find it useful. I make no claim to novelty here—I think all this ground was covered by Jaynes (1968)—but I do have a sense that this perspective (and the measure-theoretic intuition behind it) is not pervasive around here, the way Bayesian updating is.
First, I want to point out that entropy of a probability measure $p$ is only definable relative to a base measure $μ$ , as follows:
$H_{μ} (p) = - \int_{X} \frac{d p}{d μ} (x) log \frac{d p}{d μ} (x) d μ (x)$
(The derivatives notated here denote Radon-Nikodym derivatives; the integral is Lebesgue.) Shannon’s formulae, the discrete $H (p) = - \sum_{i} p (x_{i}) log p (x_{i})$ and the continuous $H (p) = - \int_{X} p (x) log p (x) d x$ , are the special cases of this where $μ$ is assumed to be counting measure or Lebesgue measure, respectively. These formulae actually treat $p$ as having a subtly different type than “probability measure”: namely, they treat it as a density with respect to counting measure (a “probability mass function”) or a density with respect to Lebesgue measure (a “probability density function”), and implicitly supply the corresponding $μ$ .
If you’re familiar with Kullback–Leibler divergence ( $D_{K L}$ ), and especially if you’ve heard $D_{K L}$ called “relative entropy,” you may have already surmised that $H_{μ} (p) = - D_{K L} (p | | μ)$ . Usually, KL divergence is defined with both arguments being probability measures (measures that add up to 1), but that’s not required for it to be well-defined (what is required is absolute continuity, which is sort of orthogonal). The principle of “maximum entropy,” or $arg {max}_{p} H_{μ} (p)$ , is equivalent to $arg {min}_{p} D_{K L} (p | | μ)$ . In the absence of additional constraints on $p$ , the solution of this is $p = μ$ , so maximum entropy makes sense as a rule for minimum confidence to exactly the same extent that the implicit base measure $μ$ makes sense as a prior. The principle of maximum entropy should really be called “the principle of minimum updating”, i.e., making a minimum-KL-divergence move from your prior $μ$ to your posterior $p$ when the posterior is constrained to exactly agree with observed facts. (Standard Bayesian updating can be derived as a special case of this.)
Sometimes, the structure of a situation has some symmetry group with respect to which the situation of uncertainty seems to be invariant, with classic examples being relabeling heads/tails on a coin, or arbitrarily permuting a shuffled deck of cards. In these examples, the requirement that a prior be invariant with respect to those symmetries (in Jaynes’ terms, the principle of transformation groups) uniquely characterizes counting measure as the only consistent prior (the classical principle of indifference, which still lies at the core of grade-school probability theory). In other cases, like a continuous roulette wheel, other Haar measures (which generalize both counting and Lebesgue measure) are justified. But taking “indifference” or “insufficient reason” to justify using an invariant measure as a prior in an arbitrary situation (as Laplace apparently did) is fraught with difficulties:
1. Most obviously, the invariant measure on $[- \infty, \infty]$ with respect to translations, namely Lebesgue measure, is an improper prior: it is a non-probability measure because its integral (formally, $\int_{- \infty}^{\infty} 1 d μ (x)$ ) is infinite. If we’re talking about forecasting the timing of a future event, $(0, \infty)$ is a very natural space, but $\int_{0}^{\infty} 1 d μ (x)$ is no less infinite. Discretizing into year-sized buckets doesn’t help, since counting measure on $N$ is also infinite (formally, $\sum_{i = 0}^{ω} 1 = \infty$ ). In the context of maximum entropy, using an infinite measure for $μ$ means that there is no maximum-entropy $p$ —you can always get more entropy by spreading the probability mass even thinner.
2. But what if we discretize and also impose a nothing-up-my-sleeve outrageous-but-finite upper bound, like the maximum binary64 number at around $1.8 \times 10^{308}$ ? Counting measure on ${i : N | i < 1.8 \times 10^{308}}$ can be normalized into a probability measure, so what stops that from being a reasonable “unconfident” prior? Sometimes this trick can work, but the deeper issue is that the original symmetry-invariance argument that successfully justifies counting measure for shuffled cards just makes no sense here. If one relabels all the years, say reversing their order, the situation of uncertainty is decidedly not equivalent.
3. Another difficulty with using invariant measures as priors as a general rule is that they are not always uniquely characterized, as in the Bertrand paradox, or the obvious incompatibility between uniform priors (invariant to addition) and log-uniform priors (invariant to multiplication).
I think Humbali’s confusion can be partially explained as conflating an invariant measure and a prior—in both directions:
1. First, Humbali implicitly uses a translation-invariant base measure as a prior when he claims as absolute a notion of “entropy” which is actually relative to that particular base measure. Something like this mistake was made by both Laplace and Shannon, so Humbali is in good company here—but already confused, because translation on the time axis is not a symmetry with respect to which forecasts ought to be invariant.
2. Then, when cornered about a particular absurd prediction that inevitably arises from the first mistake, Humbali implicitly uses his (socially-driven) prior as a base measure, when he says “somebody with a wide probability distribution over AGI arrival spread over the next century, with a median in 30 years, is in realistic terms about as uncertain as anybody could possibly be.” Assuming he’s still using “entropy” at all as the barometer of virtuous unconfidence, he’s now saying that the way to fix the absurd conclusions of maximum-entropy relative to Lebesgue measure is that one really ought to measure unconfidence with respect to a socially-adjusted “base rate” measure, which just happens to be his own prior. (I think the lexical overlap between “base rate” and “base measure” is not a coincidence.) This second position is more in bad-faith than the first because it still has the bluster of objectivity without any grounding at all, but it has more hope of formal coherence: one can imagine a system of collectively navigating uncertainty where publicly maintaining one’s own epistemic negentropy, explicitly relative to some kind of social median, comes at a cost (e.g. hypothetically or literally wagering with others).
There is a bit of motte-and-bailey uncovered by the bad-faith in position 2. Humbali all along primarily wants to defend his prior as unquestionably reasonable (the bailey), and when he brings up “maximum entropy” in the first place, he’s retreating to the motte of Lebesgue measure, which seems to have a formidable air of mathematical objectivity about it. Indeed, by its lights, Humbali’s own prior does happen to have more entropy than Eliezer’s, though Lebesgue measure fails to support the full bailey of Humbali’s actual prior. However, in this case even the motte is not defensible, since Lebesgue measure is an improper prior and the translation-invariance that might justify it simply has no relevance in this context.

Meta: any feedback about how best to make use of the channels here (commenting, shortform, posting, perhaps others I’m not aware of) is very welcome; I’m new to actually contributing content on AF.
What links here?

davidad 31 Oct 2011 9:46 UTC
47 points
in reply to: atucker’s comment on: Whole Brain Emulation: Looking At Progress On C. elgans
That’s me. In short form, my justification for working on such a project where many have failed before me is:
1. The “connectome” of C. elegans is not actually very helpful information for emulating it. Contrary to popular belief, connectomes are not the biological equivalent of circuit schematics. Connectomes are the biological equivalent of what you’d get if you removed all the component symbols from a circuit schematic and left only the wires. Good luck trying to reproduce the original functionality from that data.
2. What you actually need is to functionally characterize the system’s dynamics by performing thousands of perturbations to individual neurons and recording the results on the network, in a fast feedback loop with a very very good statistical modeling framework which decides what perturbation to try next.
3. With optogenetic techniques, we are just at the point where it’s not an outrageous proposal to reach for the capability to read and write to anywhere in a living C. elegans nervous system, using a high-throughput automated system. It has some pretty handy properties, like being transparent, essentially clonal, and easily transformed. It also has less handy properties, like being a cylindrical lens, being three-dimensional at all, and having minimal symmetry in its nervous system. However, I am optimistic that all these problems can be overcome by suitably clever optical and computational tricks.
I’m a disciple of Kurzweil, and as such I’m prone to putting ridiculously near-future dates on major breakthroughs. In particular, I expect to be finished with C. elegans in 2-3 years. I would be Extremely Suprised, for whatever that’s worth, if this is still an open problem in 2020.
What links here?
- Whole Brain Emulation: No Progress on C. elegans After 10 Years by niconiconi (1 Oct 2021 21:44 UTC; 222 points)
- jefftk's comment on We Haven’t Uploaded Worms by jefftk (27 Dec 2014 12:00 UTC; 15 points)

davidad 10 Dec 2021 22:22 UTC
43 points
on: [Linkpost] Chinese government’s guidelines on AI
One of the members of the committee that authored this (if not the chairperson) is Yi Zeng. He’s persistently engaged in conversations with CSER and other AI ethics groups in the UK, Australia, at the UN, etc.; I’ve met him at a few events and I believe that most of the values quoted above are really sincerely held. My main concern here is rather that these values are still stated in terms that may be too vague to interpret and enforce uniformly as a practical regulation throughout the large Chinese AI industry. But it’s no doubt a step in the right direction toward being actually binding on relevant actions.
What links here?
- ChristianKl's comment on AI coordination needs clear wins by evhub (3 Sep 2022 14:22 UTC; 8 points)

davidad 6 Oct 2021 10:53 UTC
35 points
in reply to: mukashi’s comment on: Whole Brain Emulation: No Progress on C. elgans After 10 Years
I can’t say for sure why Boyden or others didn’t assign grad students or postdocs to a Nemaload-like direction; I wasn’t involved at that time, there are many potential explanations, and it’s hard to distinguish limiting/bottleneck or causal factors from ancillary or dependent factors.

That said, here’s my best explanation. There are a few factors for a life-science project that make it a good candidate for a career academic to invest full-time effort in:
1. The project only requires advancing the state of the art in one sub-sub-field (specifically the one in which the academic specializes).
2. If the state of the art is advanced in this one particular way, the chances are very high of a “scientifically meaningful” result, i.e. it would immediately yield a new interesting explanation (or strong evidence for an existing controversial explanation) about some particular natural phenomenon, rather than just technological capabilities. Or, failing that, at least it would make a good “methods paper”, i.e. establishing a new, well-characterized, reproducible tool which many other scientists can immediately see is directly helpful for the kind of “scientifically meaningful” experiments they already do or know they want to do.
3. It is easy to convince people that your project is plausibly on a critical path in the roadmap towards one of the massive medical challenges that ultimately motivate most life-science funding, such as finding more effective treatments for Alzheimer’s, accelerating the vaccine pipeline, preventing heart disease, etc.
The more of these factors are present, the more likely your effort as an academic will lead to career advancement and recognition. Nemaload unfortunately scored quite poorly on all three counts, at least until recently:

(1) It required advancing the state-of-the-art in, at least: C. elegans genetic engineering, electro-optical system integration, computer vision, quantitative structural neuroanatomy of C. elegans, mathematical modeling, and automated experimental design.

(2) Even the final goal of Nemaload (uploading worms who’ve learned different behaviors and showing that the behaviors are reproduced in simulations) is barely “scientifically meaningful”. All it would demonstrate scientifically (as opposed to technically) is that learned behaviors are encoded in some way in neural dynamics. This hypothesis is at the same time widely accepted and extremely difficult to convince skeptics of. Of course, studying the uploaded dynamics might yield fascinating insights into how nature designs minds, but it also might be pretty black-boxy and inexplicable without advancing the state of the art in yet further ways.

(2b) Worse, partial progress is even less scientifically meaningful, e.g. “here’s a time-series of half the neurons, I guess we can do unsupervised clustering on it, oh look at that, the neural activity pattern can predict whether the worm is searching for food or not, as can, you know, looking at it.” To get an upload, you need all the components of the uploading machine, and you need them all to work at full spec. And partial progress doesn’t make a great methods paper either, for the following reason. Any particular experiment that worm neuroscientists want to do, they can do more cheaply and effectively in other ways, like genetically engineering only the specific neurons they care about for that experiment to fluoresce when they’re active. Even if they’re interested in a lot of neurons, they’re going to average over a population anyway, so they can just look at a handful of neurons at a time. And they also don’t mind doing all kinds of unnatural things to the worms like microfluidic immobilization to make the experiment easier, even though that makes the worms’ overall mental-state very, shall we say, perturbed, because they’re just trying to probe one neural circuit at a time, not to get a holistic view of all behaviors across the whole mental-state-space.

(3) The worm nervous system is in most ways about as far as you can get from a human nervous system while still being made of neural cells. C. elegans is not the model organism of choice for any human neurological disorder. Further, the specific technical problems and solutions are obviously not going to generalize to any creature with a bony skull, or with billions of neurons. So what’s the point? It’s a bit like sending a lander to the Moon when you’re trying to get to Alpha Centauri. There are some basic principles of celestial mechanics and competencies of system design and integration that will probably mostly generalize, and you have to start acquiring those with feedback from attempting easier missions. Others may argue that Nemaload on a roadmap to any science on mammals (let alone interventions on humans) is more like climbing a tree when you’re trying to get to the Moon. It’s hard to defend against this line of attack.

If a project has one or two of these factors but not all three, then if you’re an ambitious postdoc with a good CV already in a famous lab, you might go for it. But if it has none, it’s not good for your academic career, and if you don’t realize that, your advisor has a duty of care to guide you towards something more likely to keep your trajectory on track. Advisors don’t owe the same duty of care to summer undergrads.

Adam Marblestone might have more insight on this question; he was at the Boyden lab in that time. It also seems like the kind of phenomenon that Alexey Guzey likes to try to explain.

davidad 5 Oct 2021 10:38 UTC
35 points
on: Whole Brain Emulation: No Progress on C. elgans After 10 Years
I might have time for some more comments later, but here’s a few quick points (as replies):
What links here?
- davidad's comment on Why I Moved from AI to Neuroscience, or: Uploading Worms by davidad (16 Dec 2021 12:39 UTC; 4 points)

davidad 14 Oct 2023 18:32 UTC
LW: 28 AF: 11
12
AF
on: RSPs are pauses done right
I think AI Safety Levels are a good idea, but evals-based classification needs to be complemented by compute thresholds to mitigate the risks of loss of control via deceptive alignment. Here is a non-nebulous proposal.

davidad 14 Dec 2022 10:05 UTC
LW: 28 AF: 15
13
AF
on: Trying to disambiguate different questions about whether RLHF is “good”
Should we do research on alignment schemes which use RLHF as a building block? E.g. work on recursive oversight schemes or RLHF with adversarial training?
- IMO, this kind of research is promising and I expect a large fraction of the best alignment research to look like this.
This seems like the key! It’s probably what people actually mean by the question “is RLHF a promising alignment strategy?”

Most of this post is laying out thoughtful reasoning about related but relatively uncontroversial questions like “is RLHF, narrowly construed, plausibly sufficient for alignment” (of course not) and “is RLHF, very broadly construed, plausibly useful for alignment” (of course yes). I don’t want to diminish the value of having those answers be more common-knowledge. But I do want to call attention to how little of the reasoning elsewhere in the post seems to me to support the plausibility of this opinion here, which is the most controversial and decision-relevant one, and which is stated without any direct justification. (There’s a little bit of justification for it elsewhere, which I’ve argued against in separate comments.) I’m afraid that one post which states a bunch of opinions about related questions, while including detailed reasoning but only for the less controversial ones, might be more persuasive than it ought to be about the juicier questions.
What links here?

davidad 9 Dec 2021 18:25 UTC
LW: 27 AF: 15
AF
on: Biology-Inspired AGI Timelines: The Trick That Never Works
I wrote a lengthy exegesis of Humbali’s confusion around “maximum entropy”, which I decided ended up somewhere between a comment and a post in terms of quality, so I put it here in “Shortform”. I’m new to contributing content on AF, so meta-level feedback about how best to use the different channels (commenting, shortform, posting) is welcome.

davidad 5 Oct 2021 10:53 UTC
26 points
in reply to: davidad’s comment on: Whole Brain Emulation: No Progress on C. elgans After 10 Years
1. There has been some serious progress in the last few years on full functional imaging of the C. elegans nervous system (at the necessary spatial and temporal resolutions and ranges).
- A good summary of the state of play as of late 2020 can be found in this opinion article: https://www.sciencedirect.com/science/article/pii/S0959438820301689
- State-of-the-art work is currently happening in Shanghai, Hefei, Wuhan, and Beijing. See https://doi.org/10.1002/cyto.a.24483 and https://arxiv.org/pdf/2109.10474.pdf
However, despite this I haven’t been able to find any publications yet where full functional imaging is combined with controlled cellular-scale stimulation (e.g. as I proposed via two-photon excitation of optogenetic channels), which I believe is necessary for inference of a functional model.
What links here?
- Whole Brain Emulation: No Progress on C. elegans After 10 Years by niconiconi (1 Oct 2021 21:44 UTC; 222 points)

davidad 5 Oct 2021 10:38 UTC
25 points
in reply to: davidad’s comment on: Whole Brain Emulation: No Progress on C. elgans After 10 Years
1. I was certainly overconfident about how easy Nemaload would be, especially given the microscopy and ML capabilities of 2012, but moreso I was overconfident that people would continue to work on it. I think there was very little work toward the goal of a nematode-upload machine for the four years from 2014 through 2017. Once or twice an undergrad doing a summer internship at the Boyden lab would look into it for a month, and my sense is that accounted for something like 3-8% of the total global effort.
What links here?

davidad 26 Jan 2022 12:48 UTC
LW: 22 AF: 13
AF
on: ELK First Round Contest Winners
Congratulations to all the new winners!

This seems like a good time to mention that, after I was awarded a retroactive prize alongside the announcement of this contest, I estimated I’d have been about half as likely to have generated my idea without having read an earlier comment by Scott Viteri, so based on Shapley value I suggested reallocating 25% of my prize to Scott, which I believe ARC did. I’m delighted to see Scott also getting a prize for a full proposal.

I’m excited for the new proposals and counterexamples to be published so that we can all continue to build on each others’ ideas in the open.

davidad 10 Dec 2021 6:05 UTC
22 points
in reply to: ESRogs’s comment on: Biology-Inspired AGI Timelines: The Trick That Never Works
Eliezer’s main point of his ~20k words isn’t really what I want to defend, but I will say a few words about how I would constructively interpret it. I think Eliezer’s main claim is that he has an intuitive system-1 model of the inhomogeneous Poisson process that emits working AGI systems, and that this model isn’t informed by the compute equivalents of biological anchors, and that his lack of being informed by that isn’t a mistake. I’m not sure if he’s actually making the stronger claim that anyone whose model is informed by the biological anchors is making a mistake, but if so, I don’t agree. My own model is somewhat informed by biological anchors; it’s more informed by what the TAI report calls “subjective impressiveness extrapolation”, extrapolations on benchmark performance, and some vague sense of other point processes that emit AI winters and major breakthroughs. Someone who has total Knightean uncertainty and no intuitive models of how AGI comes about would surely do well to adopt OpenPhil’s distribution as a prior.

davidad 5 Oct 2021 10:38 UTC
22 points
in reply to: davidad’s comment on: Whole Brain Emulation: No Progress on C. elgans After 10 Years
1. I got initial funding from Larry Page on the order of 10^4 USD and then funding from Peter Thiel on the order of 10^5 USD. The full budget for completing the Nemaload project was 10^6 USD, and Thiel lost interest in seeing it through.

davidad 16 Dec 2022 21:36 UTC
LW: 19 AF: 5
1
AF
on: Paper: Transformers learn in-context by gradient descent
I think it’s too easy for someone to skim this entire post and still completely miss the headline “this is strong empirical evidence that mesa-optimizers are real in practice”.
What links here?
- Noosphere89's comment on Anomalous tokens reveal the original identities of Instruct models by janus (10 Feb 2023 13:15 UTC; 1 point)

davidad 10 Dec 2021 12:25 UTC
19 points
in reply to: Wei Dai’s comment on: The Anthropic Trilemma
12y later
I just happened to see this whilst it happens to be 12 years later. I wonder what your sense of this puzzle is now (object-level as well as meta-level).

davidad 23 Feb 2020 8:21 UTC
19 points
on: AIRCS Workshop: How I failed to be recruited at MIRI.
This is basically off-topic, but just for the record, regarding...

someone presented a talk where they explained how they tried and failed to model and simulate a brain of C. Elegans.… Furthermore, all of their research was done prior to them discovering AI safety stuff so it’s good that no one created such a precise model of a—even if just a worm—brain.

That was me; I have never believed (at least not yet) that it’s good that the C. elegans nervous system is still not understood; to the contrary, I wish more neuroscientists were working on such a “full-stack” understanding (whole nervous system down to individual cells). What I meant to say is that I am personally no longer compelled to put my attention toward C. elegans, compared to work that seems more directly AI-safety-adjacent.

I could imagine someone making a case that understanding low-end biological nervous systems would bring us closer to unfriendly AI than to friendly AI, and perhaps someone did say such a thing at AIRCS, but I don’t recall it and I doubt I would agree. More commonly, people make the case that nervous-system uploading technology brings us closer to friendly AI in the form of eventually uploading humans—but that is irrelevant one way or the other if de novo AGI is developed by the middle of this century.

One final point: it is possible that understanding simple nervous systems gives humanity a leg up on interpretability (of non-engineered, neural decision-making), without providing new capabilities until somewhere around spider level. I don’t have much confidence that any systems-neuroscience techniques for understanding C. elegans or D. rerio would transfer to interpreting AI’s decision-making or motivational structure, but it is plausible enough that I currently consider such work to be weakly good for AI safety.
What links here?
- Whole Brain Emulation: No Progress on C. elegans After 10 Years by niconiconi (1 Oct 2021 21:44 UTC; 222 points)

davidad 13 Apr 2012 10:54 UTC
19 points
in reply to: XiXiDu’s comment on: Why I Moved from AI to Neuroscience, or: Uploading Worms
There’s the rub! I happen to value technological progress as an intrinsic good, so classifying a Singularity as “positive” or “negative” is not easy for me. (I reject the notion that one can factorize intelligence from goals, so that one could take a superintelligence and fuse it with a goal to optimize for paperclips. Perhaps one could give it a compulsion to optimize for paperclips, but I’d expect it to either put the compulsion on hold while it develops amazing fabrication, mining and space travel technologies, and never completely turn its available resources into paperclips since that would mean no chance of more paperclips in the future; or better yet, rapidly expunge the compulsion through self-modification.) Furthermore, I favor Kurzweil’s smooth exponentials over “FOOM”: although it may be even harder to believe that not only will there be superintelligences in the future, but that at no point between now and then will an objectively identifiable discontinuity happen, it seems more consistent with history. Although I expect present-human culture to be preserved, as a matter of historical interest if not status quo, I’m not partisan enough to prioritize human values over the Darwinian imperative. (The questions linked seem very human-centric, and turn on how far you are willing to go in defining “human,” suggesting a disguised query. Most science is arguably already performed by machines.) In summary, I’m just not worried about AI risk.

The good news for AI worriers is that Eliezer has personally approved my project as “just cool science, at least for now”—not likely to lead to runaway intelligence any time soon, no matter how reckless I may be. Given that and the fact that I’ve heard many (probably most) AI-risk arguments, and failed to become worried (quite probably because I hold the cause of technological progress very dear to my heart and am thus heavily biased—at least I admit it!), your time may be better spent trying to convince Ben Goertzel that there’s a problem, since at least he’s an immediate threat. ;)
What links here?

davidad 14 Dec 2022 9:00 UTC
LW: 18 AF: 11
6
AF
on: Trying to disambiguate different questions about whether RLHF is “good”
I don’t think RLHF seems worse than [prompting or filtering to] try to get your model to use its capabilities to be helpful. (Many LessWrong commenters disagree with me here but I haven’t heard them give any arguments that I find compelling.)
Where does the following argument fall flat for you?
1. Human feedback is systematically wrong, e.g. approving of impressively well-reasoned-looking confabulations much more than “I don’t know.”
2. The bias induced by sequence prediction preserves some of the correlations found in Internet text between features that human raters use as proxies (like being impressively well-reasoned-looking) and features that those are proxies for (like being internally coherent, and being consistent with other information found in the corpus).
3. RL fine-tuning applies optimisation pressure to break these correlations in favor of optimising the proxies.
4. Filtering (post-sequence-generation) preserves the correlations while optimising, making it more likely that the outputs which pass the filter have the underlying features humans thought they were selecting for.

davidad 9 Dec 2021 20:59 UTC
LW: 18 AF: 9
AF
on: Biology-Inspired AGI Timelines: The Trick That Never Works
In my view, the biological anchors and the Very Serious estimates derived therefrom are really useful for the following very narrow yet plausibly impactful purpose: persuading people, whose intuitive sense of biological anchors as lower bounds leads their AGI timeline to stretch out to 2100 or beyond, that they should really have shorter timelines. This is almost touched on in the dialogue where the OpenPhil character says “isn’t this at least a soft upper bound?” but Eliezer dismisses it as neither an upper nor a lower bound. I don’t disagree with Eliezer’s actual claim there, but I think he may be missing the value of putting well-researched soft upper bounds on a bunch of variables that generally well-read people often perceive as lower bounds on AGI’s arrival time—and guess to be much further away than they are. Even if those variables provide almost no relevant evidence to someone who has the appropriate kind of uncertainty about AGI’s compute requirements as a function of humanity’s knowledge about AI.

From this perspective, aggregating all the biological anchors with mixture weights tuned to produce a distribution whose median matches the psychological plausibility of Platt’s Law is a way to make the report as a whole Overton-window-compatible, while enabling readers to widen their distribution enough to put substantial probability mass on sooner years than the Overton window would have permitted, or even to decide for themselves to down-weight the larger biological anchors and shift their entire distribution sooner.