aysja

Karma: 1,787

Why Are Bacteria So Simple?

aysja6 Feb 2023 3:00 UTC

171 points

33 comments10 min readLW link

aysja 5 Oct 2023 6:00 UTC
132 points
55
on: Thomas Kwa’s MIRI research experience
Meta: I don’t want this comment to be taken as “I disagree with everything you (Thomas) said.” I do think the question of what to do when you have an opaque, potentially intractable problem is not obvious, and I don’t want to come across as saying that I have the definitive answer, or anything like that. It’s tricky to know what to do, here, and I certainly think it makes sense to focus on more concrete problems if deconfusion work didn’t seem that useful to you.
That said, at a high-level I feel pretty strongly about investing in early-stage deconfusion work, and I disagree with many of the object-level claims you made suggesting otherwise. For instance:
The neuroscientists I’ve talked to say that a new scanning technology that could measure individual neurons would revolutionize neuroscience, much more than a theoretical breakthrough. But in interpretability we already have this, and we’re just missing the software.
It seems to me like the history of neuroscience should inspire the opposite conclusion: a hundred years of increasingly much data collection at finer and finer resolution, and yet, we still have a field that even many neuroscientists agree barely understands anything. I did undergrad and grad school in neuroscience and can at the very least say that this was also my conclusion. The main problem, in my opinion, is that theory usually tells us which facts to collect. Without it—without even a proto-theory or a rough guess, as with “model-free data collection” approaches—you are basically just taking shots in the dark and hoping that if you collect a truly massive amount of data, and somehow search over it for regularities, that theory will emerge. This seems pretty hopeless to me, and entirely backwards from how science has historically progressed.
It seems similarly pretty hopeless to me to expect a “revolution” out of tabulating features of the brain at low-enough resolution. Like, I certainly buy that it gets us some cool insights, much like every other imaging advance has gotten us some cool insights. But I don’t think the history of neuroscience really predicts a “revolution,” here. Aside from the computational costs of “understanding” an object in such a way, I just don’t really buy that you’re guaranteed to find all the relevant regularities. You can never collect *all* the data, you have to make choices and tradeoffs when you measure the world, and without a theory to tell you which features are meaningfully irrelevant and can be ignored, it’s hard to know that you’re ultimately looking at the right thing.
I ran into this problem, for instance, when I was researching cortical uniformity. Academia has amassed a truly gargantuan collection of papers on the structural properties of the human neocortex. What on Earth do any of these papers say about how algorithmically uniform the brain is? As far as I can tell, pretty much close to zero, because we have no idea how the structural properties of the cortex relate to the functional ones, and so who’s to say that “neuron subtype A is more dense in the frontal cortex relative to the visual cortex” is a meaningful finding or not? I worry that other “shot in the dark” data collection methods will suffer similar setbacks.
Eliezer has written about how Einstein cleverly used very limited data to discover relativity. But we could have discovered relativity easily if we observed not only the precession of Mercury, but also the drifting of GPS clocks, gravitational lensing of distant galaxies, gravitational waves, etc.
It’s of course difficult to say how science might have progressed counterfactually, but I find it pretty hard to believe that relativity would have been “discovered easily” were we to have had a bunch of data staring us in the face. In general, I think it’s very easy to underestimate how difficult it is to come up with new concepts. I felt this way when I was reading about Darwin and how it took him over a year to go from realizing that “artificial selection is the means by which breeders introduce changes,” to realizing that “natural selection is the means by which changes are introduced in the wild.” But then I spent a long time in his shoes, so to speak, operating from within the concepts he had available to him at the time, and I became more humbled. For instance, among other things, it seems like a leap to go from “a human uses their intellect to actively select” to “nature ends up acting like a selector, in the sense that its conditions favor some traits for survival over others.” These feel like quite different “types” of things, in some ways.
In general, I suspect it’s easy to take the concepts we already have, look over past data, and assume it would have been obvious. But I think the history of science again speaks to the contrary: scientific breakthroughs are rare, and I don’t think it’s usually the case that they’re rare because of a lack of data, but because they require looking at that data differently. Perhaps data on gravitational lensing may have roused scientists to notice that there were anomalies, and may have eventually led to general relativity. But the actual process of taking the anomalies and turning that into a theory is, I think, really hard. Theories don’t just pop out wholesale when you have enough data, they take serious work.
I heard David Bau say something interesting at the ICML safety workshop: in the 1940s and 1950s lots of people were trying to unlock the basic mysteries of life from first principles. How was hereditary information transmitted? Von Neumann designed a universal constructor in a cellular automaton, and even managed to reason that hereditary information was transmitted digitally for error correction, but didn’t get further. But it was Crick, Franklin, and Watson who used crystallography data to discover the structure of DNA, unraveling far more mysteries. Since then basically all advances in biochemistry have been empirical. Biochemistry is a case study where philosophy and theory failed to solve the problems but empirical work succeeded, and maybe interpretability and intelligence are similar.
This story misses some pretty important pieces. For instance, Schrödinger predicted basic features about DNA—that it was an aperiodic crystal—using first principles in his book What if Life? published in 1944. The basic reasoning is that in order to stably encode genetic information, the molecule should itself be stable, i.e., a crystal. But to encode a variety of information, rather than the same thing repeated indefinitely, it needs to be aperiodic. An aperiodic crystal is a molecule that can use a few primitives to encode near infinite possibilities, in a stable way. His book was very influential, and Francis and Crick both credited Schrödinger with the theoretical ideas that guided their search. I also suspect their search went much faster than it would have otherwise; many biologists at the time thought that the hereditary molecule was a protein, of which there are tens of millions in a typical cell.
But, more importantly, I would certainly not say that biochemistry is an area where empirical work has succeeded to nearly the extent that we might hope it to. Like, we still can’t cure cancer, or aging, or any of the myriad medical problems people have to endure; we still can’t even define “life” in a reasonable way, or answer basic questions like “why do arms come out basically the same size?” The discovery of DNA was certainly huge, and helpful, but I would say that we’re still quite far from a major success story with biology.
My guess is that it is precisely because we lack theory that we are unable to answer these basic questions, and to advance medicine as much as we want. Certainly the “tabulate indefinitely” approach will continue pushing the needle on biological research, but I doubt it is going to get us anywhere near the gains that, e.g., “the hereditary molecule is an aperiodic crystal” did.
And while it’s certainly possible that biology, intelligence, agency and so on are just not amenable to the cleave-reality-at-its-joints type of clarity one gets from scientific inquiry, I’m pretty skeptical that this the world we in fact live in, for a few reasons.
For one, it seems to me that practically no one is trying to find theories in biology. It is common for biologists (even bright-eyed, young PhDs at elite universities) to say things like (and in some cases this exact sentence): “there are no general theories in biology because biology is just chemistry which is just physics.” These are people at the beginning of their careers, throwing in the towel before they’ve even started! Needless to say, this take is clearly not true in all generality, because it would anti-predict natural selection. It would also, I think, anti-predict Newtonian mechanics (“there are no general theories of motion because motion is just the motion of chemicals which is just the motion of particles which is just physics”).
Secondly, I think that practically all scientific disciplines look messy, ad-hoc, and empirical before we get theories that tie it together, and that this does not on its own suggest biology is a theoretically bankrupt field. E.g., we had steam engines before we knew about thermodynamics, but they were kind of ad-hoc, messy contraptions, because we didn’t really understand what variables were causing the “work.” Likewise, naturalism before Darwin was largely compendiums upon compendiums of people being like “I saw this [animal/fossil/plant/rock] here, doing this!” Science before theory often looks like this, I think.
Third: I’m just like, look guys, I don’t really know what to tell you, but when I look at the world and I see intelligences doing stuff, I sense deep principles. It’s a hunch, to be sure, and kind of hard to justify, but it feels very obvious to me. And if there are deep principles to be had, then I sure as hell want to find them. Because it’s embarrassing that at this point we don’t even know what intelligence is, nor agency, nor abstractions: how to measure any of it, predict when it will increase or not. These are the gears that are going to move our world, for better or for worse, and I at least want my hands on the steering wheel when they do.
I think that sometimes people don’t really know what to envision with theoretical work on alignment, or “agent foundations”-style work. My own vision is quite simple: I want to do great science, as great science has historically been done, and to figure out what in god’s name any of these phenomena are. I want to be able to measure that which threatens our existence, such that we may learn to control it. And even though I am of course not certain this approach is workable, it feels very important to me to try. I think there is a strong case for there being a shot, here, and I want us to take it.
What links here?
- Thomas Kwa’s MIRI research experience by Thomas Kwa (2 Oct 2023 16:42 UTC; 169 points)

aysja 16 Oct 2023 9:39 UTC
LW: 107 AF: 40
63
AF
on: RSPs are pauses done right
I’m sympathetic to the idea that it would be good to have concrete criteria for when to stop a pause, were we to start one. But I also think it’s potentially quite dangerous, and corrosive to the epistemic commons, to expect such concreteness before we’re ready to give it.
I’m first going to zoom out a bit—to a broader trend which I’m worried about in AI Safety, and something that I believe evaluation-gating might exacerbate, although it is certainly not the only contributing factor.
I think there is pressure mounting within the field of AI Safety to produce measurables, and to do so quickly, as we continue building towards this godlike power under an unknown timer of unknown length. This is understandable, and I think can often be good, because in order to make decisions it is indeed helpful to know things like “how fast is this actually going” and to assure things like “if a system fails such and such metric, we’ll stop.”
But I worry that in our haste we will end up focusing our efforts under the streetlight. I worry, in other words, that the hard problem of finding robust measurements—those which enable us to predict the behavior and safety of AI systems with anywhere near the level of precision we have when we say “it’s safe for you to get on this plane”—will be substituted for the easier problem of using the measurements we already have, or those which are close by; ones which are at best only proxies and at worst almost completely unrelated to what we ultimately care about.
And I think it is easy to forget, in an environment where we are continually churning out things like evaluations and metrics, how little we in fact know. That when people see a sea of ML papers, conferences, math, numbers, and “such and such system passed such and such safety metric,” that it conveys an inflated sense of our understanding, not only to the public but also to ourselves. I think this sort of dynamic can create a Red Queen’s race of sorts, where the more we demand concrete proposals—in a domain we don’t yet actually understand—the more pressure we’ll feel to appear as if we understand what we’re talking about, even when we don’t. And the more we create this appearance of understanding, the more concrete asks we’ll make of the system, and the more inflated our sense of understanding will grow, and so on.
I’ve seen this sort of dynamic play out in neuroscience, where in my experience the ability to measure anything at all about some phenomenon often leads people to prematurely conclude we understand how it works. For instance, reaction times are a thing one can reliably measure, and so is EEG activity, so people will often do things like… measure both of these quantities while manipulating the number of green blocks on a screen, then call the relationship between these “top-down” or “bottom-up” attention. All of this despite having no idea what attention is, and hence no idea if these measures in fact meaningfully relate much to the thing we actually care about.
There are a truly staggering number of green block-type experiments in the field, proliferating every year, and I think the existence of all this activity (papers, conferences, math, numbers, measurement, etc.) convinces people that something must be happening, that progress must be being made. But if you ask the neuroscientists attending these conferences what attention is, over a beer, they will often confess that we still basically have no idea. And yet they go on, year after year, adding green blocks to screens ad infinitum, because those are the measurements they can produce, the numbers they can write on grant applications, grants which get funded because at least they’re saying something concrete about attention, rather than “I have no idea what this is, but I’d like to figure it out!”
I think this dynamic has significantly corroded academia’s ability to figure out important, true things, and I worry that if we introduce it here, that we will face similar corrosion.
Zooming back in on this proposal in particular: I feel pretty uneasy about the messaging, here. When I hear words like “responsible” and “policy” around a technology which threatens to vanquish all that I know and all that I love, I am expecting things more like “here is a plan that gives us multiple 9’s of confidence that we won’t kill everyone.” I understand that this sort of assurance is unavailable, at present, and I am grateful to Anthropic for sharing their sketches of what they hope for in the absence of such assurances.
But the unavailability of such assurance is also kind of the point, and one that I wish this proposal emphasized more… it seems to me that vague sketches like these ought to be full of disclaimers like, “This is our best idea but it’s still not very reassuring. Please do not believe that we are safely able to prevent you from dying, yet. We have no 9’s to give.” It also seems to me like something called a “responsible scaling plan” should at the very least have a convincing story to tell about how we might get from our current state, with the primitive understanding we have, to the end-goal of possessing the sort of understanding that is capable of steering a godly power the likes of which we have never seen.
And I worry that in the absence of such a story—where the true plan is something closer to “fill in the blanks as we go”—that a mounting pressure to color in such blanks will create a vacuum, and that we will begin to fill it with the appearance of understanding rather than understanding itself; that we will pretend to know more than we in fact do, because that’s easier to do in the face of a pressure for results, easier than standing our ground and saying “we have no idea what we’re talking about.” That the focus on concrete asks and concrete proposals will place far too much emphasis on what we can find under the streetlight, and will end up giving us an inflated sense of our understanding, such that we stop searching in the darkness altogether, forget that it is even there…

I agree with you that having concrete asks would be great, but I think they’re only great if we actually have the right asks. In the absence of robust measures and evaluations—those which give us high confidence about the safety of AI systems—in the absence of a realistic plan to get those, I think demanding them may end up being actively harmful. Harmful because people will walk away feeling like AI Safety “knows” more than it does and will hence, I think, feel more secure than is warranted.

aysja 5 Mar 2024 0:39 UTC
106 points
75
on: Anthropic release Claude 3, claims >GPT-4 Performance
Several people have pointed out that this post seems to take a different stance on race dynamics than was expressed previously.
I think it clearly does. From my perspective, Anthropic’s post is misleading either way—either Claude 3 doesn’t outperform its peers, in which case claiming otherwise is misleading, or they are in fact pushing the frontier, in which case they’ve misled people by suggesting that they would not do this.
Also, “We do not believe that model intelligence is anywhere near its limits, and we plan to release frequent updates to the Claude 3 model family over the next few months” doesn’t inspire much confidence that they’re not trying to surpass other models in the near future.
In any case, I don’t see much reason to think that Anthropic is not aiming to push the frontier. For one, to the best of my knowledge they’ve never even publicly stated they wouldn’t; to the extent that people believe it anyway, it is, as best I can tell, mostly just through word of mouth and some vague statements from Dario. Second, it’s hard for me to imagine that they’re pitching investors on a plan that explicitly aims to make an inferior product relative to their competitors. Indeed, their leaked pitch deck suggests otherwise: “We believe that companies that train the best ²⁰²⁵⁄₂₆ models will be too far ahead for anyone to catch up in subsequent cycles.” I think the most straightforward interpretation of this sentence is that Anthropic is racing to build AGI.
And if they are indeed pushing the frontier, this seems like a negative update about them holding to other commitments about safety. Because while it’s true that Anthropic never, to the best of my knowledge, explicitly stated that they wouldn’t do so, they nevertheless appeared to me to strongly imply it. E.g., in his podcast with Dwarkesh, Dario says:
I think we’ve been relatively responsible in the sense that we didn’t cause the big acceleration that happened late last year and at the beginning of this year. We weren’t the ones who did that. And honestly, if you look at the reaction of Google, that might be ten times more important than anything else. And then once it had happened, once the ecosystem had changed, then we did a lot of things to stay on the frontier.
And Dario on an FLI podcast:
I think we shouldn’t be racing ahead or trying to build models that are way bigger than other orgs are building them. And we shouldn’t, I think, be trying to ramp up excitement or hype about giant models or the latest advances. But we should build the things that we need to do the safety work and we should try to do the safety work as well as we can on top of models that are reasonably close to state of the art.
None of this is Dario saying that Anthropic won’t try to push the frontier, but it certainly heavily suggests that they are aiming to remain at least slightly behind it. And indeed, my impression is that many people expected this from Anthropic, including people who work there, which seems like evidence that this was the implied message.
If Anthropic is in fact attempting to push the frontier, then I think this is pretty bad. They shouldn’t be this vague and misleading about something this important, especially in a way that caused many people to socially support them (and perhaps make decisions to work there). I perhaps cynically think this vagueness was intentional—it seems implausible to me that Anthropic did not know that people believed this yet they never tried to correct it, which I would guess benefited them: safety-conscious engineers are more likely to work somewhere that they believe isn’t racing to build AGI. Hopefully I’m wrong about at least some of this.
In any case, whether or not Claude 3 already surpasses the frontier, soon will, or doesn’t, I request that Anthropic explicitly clarify whether their intention is to push the frontier.
What links here?
- On Claude 3.0 by Zvi (6 Mar 2024 18:50 UTC; 75 points)

aysja 8 Jan 2024 4:01 UTC
73 points
19
on: MIRI 2024 Mission and Strategy Update
We think it’s very unlikely that the AI alignment field will be able to make progress quickly enough to prevent human extinction.
From my point of view it seems possible that we could solve technical alignment, or otherwise become substantially deconfused, within 10 years—perhaps much sooner. I don’t think we’ve ruled out that foundational scientific progress is capable of solving the problem, nor that cognitively unenhanced humans might be able to succeed at such an activity. Like, as far as I can tell, very few people have tried working on the problem directly, in the sense of forming original lines of attack (maybe between 10 to 20?), many of whom share similar ontological backgrounds. This doesn’t seem like overwhelming evidence, to me, that the situation is doomed.
For instance, in the late 1800’s physicists were so ontologically committed to the idea of absolute rest that they spent many decades searching for ether, instead of discovering special relativity. Even Lorentz and Poincaré, who both had many of the key ideas for special relativity, never made the final leap—even after Einstein’s publication—because they were so committed to their traditional notions of space and time. If everyone within a field is operating under the same incorrect ontological assumptions it can seem like progress is impossible, when in fact progress is just hard when you have the wrong concepts.
Also, conceptual progress can happen quickly. I don’t think it necessarily looks (to most of the outside world) like people are making steady progress towards deconfusion. I think it often looks more like “that person is doing some weird thing over there” until they present an inferential-distance-crossing work and it clicks for the rest of the world. At least, this is roughly what happened for Einstein (with his “miracle year”—introducing special relativity, light quanta, etc.), Newton (with his “year of wonders”—theory of gravity, calculus, and many insights into optics), and Darwin (with Origin of Species).
In other words, I don’t think modeling the current landscape as dire is that much evidence that it will remain so for years to come. Things look confusing and hard until they’re not; and historically speaking, great scientists have sometimes been able to make great conceptual progress on seemingly difficult problems—often suddenly, and unexpectedly.
I consider the Sequences to be one of the greatest philosophical texts of the century to date, but while it would be hard to explain in a few sentences, I also think that it got some key ontological commitments wrong. In any case, I worry that MIRI is over-anchoring on their ontology being the correct one, and then concluding that further efforts are doomed. Whereas I strongly suspect there’s room for philosophical work to bear unexpected (and fast) alignment progress. Especially so, given the amount of ontological correlated-ness among the few people who have tried to figure out alignment so far.
I think one way to cause conceptual progress to happen faster is just to have more people working on the problem in more ontologically decorrelated ways. Because of that, I personally feel worried about what seems to me like an increasing push towards policy work or towards already developed agendas. Not that working on either of those is necessarily bad—many such efforts strike me as important bets to make, and I’m deeply grateful that people are pursuing them. Just that, on the margin, I think we ought to be allotting more of our portfolio to people that are developing their own angles on the problem.
I really want our culture to support minds who take on the strange, difficult, and vulnerable task of trying to make scientific progress at the frontier of human knowledge. And I don’t want to lose sight of that, or for the miasma of generalized hopelessness to make people less likely to try it.

aysja 20 Apr 2024 1:27 UTC
45 points
12
on: Express interest in an “FHI of the West”
Aw man, this is so exciting! There’s something really important to me about rationalist virtues having a home in the world. I’m not sure if what I’m imagining is what you’re proposing, exactly, but I think most anything in this vicinity would feel like a huge world upgrade to me.
Apparently I have a lot of thoughts about this. Here are some of them, not sure how applicable they are to this project in particular. I think you can consider this to be my hopes for what such a thing might be like, which I suspect shares some overlap.
It has felt to me for a few years now like something important is dying. I think it stems from the seeming inevitability of what’s before us—the speed of AI progress, our own death, the death of perhaps everything—that looms, shadow-like. And it’s scary to me, and sad, because “inevitability” is a close cousin of “defeat,” and I fear the two inch closer all the time.
It’s a fatalism that creeps in slow, but settles thick. And it lurks, I think, in the emotional tenor of doom that resides beneath nominally probabilistic estimates of our survival. Lurks as well, although much more plainly, within AI labs: AGI is coming whether we want it to or not, pausing is impossible, the invisible hand holds the reins, or as Claude recently explained to me, “the cat is already out of the bag.” And I think this is sometimes intentional—we are supposed to think about labs in terms of the overwhelming incentives, more than we are supposed to think about them as composed of agents with real choice, because that dispossesses them of responsibility, and dispossesses us of the ability to change them.
There is a similar kind of fatalism that often attaches to the idea of the efficient marketplace—that what is desired has already been done, that if one sits back and lets the machine unfold it will arrive at all the correct conclusions itself. There is no room, in that story, for genuinely novel ideas or progress, all forward movement is the result of incremental accretions on existing structures. This sentiment looms in academia as well—that there is nothing fundamental or new left to uncover, that all low hanging fruit has been plucked. Academic aims rarely push for all that could be—progress is instead judged relatively, the slow inching away from what already is.
And I worry this mentality is increasingly entrenching itself within AI safety, too. That we are moving away from the sort of ambitious science that I think we need to achieve the world that glows—the sort that aims at absolute progress—and instead moving closer to an incremental machine. After all, MIRI tried and failed to develop agent foundations so maybe we can say, “case closed?” Maybe “solving alignment” was never the right frame in the first place. Maybe it always was that we needed to do the slow inching away from the known, the work that just so happens not to challenge existing social structures. There seems to me, in other words, to be a consensus closing in: new theoretical insights are unlikely to emerge, let alone to have any real impact on engineering. And unlikelier, still, to happen in time.
I find all of this fatalism terribly confused. Not only because it has, I think, caused people to increasingly depart from the theoretical work which I believe is necessary to reach the world that glows, but because it robs us of our agency. The closer one inches towards inevitability, the further one inches away from the human spirit having any causal effect in the world. What we believe is irrelevant, what is good and right is irrelevant; the grooves have been worn, the structures erected—all that’s left is for the world to follow course. We cannot simply ask people to do what’s right, because they apparently can’t. We cannot succeed at stopping what is wrong, because the incentives are too strong to be opposed. All we can do, it seems, is to meld with the structure itself, making minor adjustments on the margin.
And there’s a feeling I get, sometimes, when I look at all of this, as if a tidal wave were about to engulf me. The machine has a life of its own; the world is moved by forces outside of my control. And it scares me, and I feel small. But then I remember that it’s wrong.
There was a real death, I think, that happened when MIRI leadership gave up on solving alignment, but we haven’t yet held the funeral. I think people carry that—the shadow of the fear, unnamed but tangible: that we might be racing towards our inevitable death, that there might not be much hope, that the grooves have been worn, the structures erected, and all that’s left is to give ourselves away as we watch it all unravel. It’s not a particularly inspiring vision, and in my opinion, not a particularly correct one. The future is built out of our choices; they matter, they are real. Not because it would be nice to believe it, but because it is macroscopically true. If one glances at history, it’s obvious that ideas are powerful, that people are powerful. The incentives do not dictate everything, the status quo is never the status quo for very long. The future is still ours to decide. And it’s our responsibility to do so with integrity.
I have a sense that this spirit has been slipping, with MIRI leadership largely admitting defeat, with CFAR mostly leaving the scene, with AI labs looming increasingly large within the culture and the discourse. I don’t want it to. I want someone to hold the torch of rationality and all its virtues, to stay anchored on what is true and good amidst a landscape of rapidly changing power dynamics, to fight for what’s right with integrity, to hold a positive vision for humanity. I want a space for deep inquiry and intellectual rigor, for aiming at absolute progress, for trying to solve the god damn problem. I think Lightcone has a good shot at doing a fantastic job of bringing something like this to life, and I’m very exited to see what comes of this!

aysja 29 Oct 2023 11:19 UTC
43 points
30
on: We’re Not Ready: thoughts on “pausing” and responsible scaling policies
Thanks for writing this post—I appreciate the candidness about your beliefs here, and I agree that this is a tricky topic. I, too, feel unsettled about it on the object level.
On the meta level, though, I feel grumpy about some of the framing choices. There’s this wording which both you and the original ARC evals post use: that responsible scaling policies are a “robustly good compromise,” or, in ARC’s case, that they are a “pragmatic middle ground.” I think these stances take for granted that the best path forward is compromising, but this seems very far from clear to me.
Like, certainly not all cases of “people have different beliefs and preferences” are ones where compromise is the best solution. If someone wants to kill me, I’m not going to be open to negotiating about how many limbs I’m okay with them taking. This is obviously an extreme example, but I actually don’t think it’s that far off from the situation we find ourselves in now where, e.g., Dario gives a 10-25% probability that the sort of technology he is advancing will either cause massive catastrophe or end the human race. When people are telling me that their work has a high chance of killing me, it doesn’t feel obvious that the right move is “compromising” or “finding a middle ground.”
The language choices here feel sketchy to me in the same way that the use of the term “responsible” feels sketchy to me. I certainly wouldn’t call the choice to continue building the unsettling-chance-of-annihilation machine responsible. Perhaps it’s more responsible than the default, but that’s a different claim and not one that is communicated in the name. Similarly, “compromise” and “middle ground” are the kinds of phrases that seem reasonable from a distance, but if you look closer they’re sort of implicitly requesting that we treat “keep racing ahead to our likely demise” as a sensible option.
“Actively fighting improvements on the status quo because they might be confused for sufficient progress feels icky to me.”
This seems to me to misrepresent the argument. At the very least, it misrepresents mine. It’s not that I’m fighting an improvement to the status quo, it’s that I don’t think responsible scaling policies are an improvement if they end up being confused for sufficient progress.
Like, in the worlds where alignment is hard, and where evals do not identify the behavior which is actually scary, then I claim that the existence of such evals is concerning. It’s concerning because I suspect that capabilities labs are more incentivized to check off the “passed this eval” box than they are to ensure that their systems are actually safe. And in the absence of a robust science of alignment, I claim that this most likely results in capability labs goodharting on evals which are imperfect proxies for what we care about, making systems look safer than they are. This does not seem like an improvement to me. I want the ability to say what’s actually true, here: that we do not know what’s going on, and that we’re building a godlike power anyway.

And I’m not saying that this is the only way responsible scaling policies could work out, or that it would necessarily be intentional, or that nobody in capabilities labs take the risk seriously. But it seems like a blindspot to neglect the the existence of the incentive landscape, here, one which is almost certainly affecting the policies that capabilities labs establish.

aysja 25 Sep 2023 20:03 UTC
41 points
17
on: Inside Views, Impostor Syndrome, and the Great LARP
Yeah, there’s something weird going on here that I want to have better handles on. I sometimes call the thing Bengio does being “motivated by reasons.” Also the skill of noticing that “words have referents” or something?

Like, the LARPing quality of many jobs seems adjacent to the failure mode Orwell is pointing to in Politics and the English Language, e.g., sometimes people will misspell metaphors—”tow the line” instead of “toe the line”—and it becomes very clear that the words have “died” in some meaningful sense, like the speaker is not actually “loading them.” And sometimes people will say things to me like “capitalism ruined my twenties” and I have a similarly eerie feeling about, like it’s a gestalt slapped together—some floating complex of words which themselves associate but aren’t tethered to anything else—and that ultimately it’s not really trying to point to anything.

Being “motivated by reasons” feels like “loading concepts,” and “taking them seriously.” It feels related to me to noticing that the world is “motivated by reasons,” as in, that there is order there to understand and that an understanding of it means that you get to actually do things in it. Like you’re anchored in reality, or something, when the default thing is to float above it. But I wish I had better words.

LARPing jobs is a bit eerie to me, too, in a similar way. It’s like people are towing the line instead of toeing it. Like they’re modeling what they’re “supposed” to be doing, or something, rather than doing it for reasons. And if you ask them why they’re doing whatever they’re doing they can often back out one or two answers, but ultimately fail to integrate it into a broader model of the world and seem a bit bewildered that you’re even asking them, sort of like your grandpa. Floating complexes—some internal consistency, but ultimately untethered?

Anyways, fwiw I think the advice probably varies by person. For me it was helpful to lean into my own taste and throw societal expectations to the wind :p Like, I think that when you’re playing the game “I should be comprehensible to basically anyone I talk to at every step of the way” then it’s much easier to fall into these grooves of what you’re “supposed” to be doing, and lapse into letting other people think for you. Probably not everyone has to do something so extreme, but for me, going hardcore on my own curiosity and developing a better sense of what it is, exactly, that I’m so curious about and why, for my own god damn self and not for anyone else’s, has gone a long way to “anchor me in reality,” so to speak.

aysja 6 Mar 2024 20:53 UTC
36 points
0
in reply to: Ben Pace’s comment on: Vote on Anthropic Topics to Discuss
I assign >10% that Anthropic will at some point completely halt development of AI, and attempt to persuade other organizations to as well (i.e., “sound the alarm.”)

aysja 2 Feb 2024 21:39 UTC
34 points
7
on: Leading The Parade
I do think that counterfactual impact is an important thing to track, although two people discovering something at the same time doesn’t seem like especially strong evidence that they were just “leading the parade.” It matters how large the set is. I.e., I doubt there were more than ~5 people around Newton’s time who could have come up with calculus. Creating things is just really hard, and I think often a pretty conjunctive set of factors needs to come together to make it happen (some of those are dispositional (ambition, intelligence, etc.), others are more like “was the groundwater there,” and others are like “did they even notice there was something worth doing here in the first place” etc).
Another way to say it is that there’s a reason only two people discovered calculus at the same time, and not tens, or hundreds. Why just two? A similar thing happened with Darwin, where Wallace came up with natural selection around the same time (they actually initially published it together). But having read a bunch about Darwin and that time period I feel fairly confident that they were the only two people “on the scent,” so to speak. Malthus influenced them both, as did living in England when the industrial revolution really took off (capitalism has a “survival of the fittest” vibe), so there was some groundwater there. But it was only these two who took that groundwater and did something powerful with it, and I don’t think there were that many other people around who could have. (One small piece of evidence that that effect: Origin of Species was published a year and a half after their initial publication, and no one else published anything on natural selection within that timespan, even after the initial idea was out there).
Also, I mostly agree about Shannon being more independent, although I do think that Turing was “on the scent” of information theory as well. E.g., from The Information: “Turing cared about the data that changed the probability: a probability factor, something like the weight of the evidence. He invented a unit he named a ‘ban.’ He found it convenient to use a logarithmic scale, so that bans would be added rather than multiplied. With a base of ten, a ban was the weight of evidence needed to make a fact ten times as likely.” This seems, to me, to veer pretty close to information theory and I think this is fairly common: a few people “on the scent,” i.e., noticing that there’s something interesting to discover somewhere, having the right questions in the first place, etc.—but only one or two who actually put in the right kind of effort to complete the idea.
There’s also something important to me about the opposite problem, which is how to assign blame when “someone else would have done it anyway.” E.g., as far as I can tell, much of Anthropic’s reasoning for why they’re not directly responsible for AI risk is because scaling is inevitable, i.e., that other labs would do it anyway. I don’t agree with them on the object-level claim (i.e., it seems possible to cause regulation to institute a pause), but even if I did, I still want to assign them blame for in fact being the ones taking the risky actions. This feels more true for me the fewer actors there are, i.e., at the point when there are only three big labs I think each of them is significantly contributing to risk, whereas if there were hundreds of leading labs I’d be less upset by any individual one. But there’s still a part of me that feels deontological about it, too—a sense that you’re just really not supposed to take actions that risky, no matter how inculpable you are counterfactually speaking.
Likewise, I have similar feelings about scientific discoveries. The people who did them are in fact the ones who did the work, and that matters to me. It matters more the smaller the set of possible people is, of course, but there’s some level upon which I want to be like “look they did an awesome thing here; it in fact wasn’t other people, and I want to assign them credit for that.” It’s related to a sense I have that doing great work is just really hard and that people perpetually underestimate this difficulty. For instance, people sometimes write off any good Musk has done (e.g., the good for climate change by creating Tesla, etc.) by saying “someone else would have made Tesla anyway” and I have to wonder, “really?” I certainly don’t look at the world and expect to see Teslas popping up everywhere. Likewise, I don’t look at the world and expect to see tons of leading AI labs, nor do I expect to see hundreds of people pushing the envelope on understanding what minds are. Few people try to do great things, and I think the set of people who might have done any particular great thing is often quite small.

aysja 13 Jan 2024 3:52 UTC
32 points
24
on: Gentleness and the artificial Other
This post is so wonderful, thank you for writing it. I’ve gone back to re-read many paragraphs over and over.
A few musings of my own:
“It’s just” … something. Oh? So eager, the urge to deflate. And so eager, too, the assumption that our concepts carve, and encompass, and withstand scrutiny. It’s simple, you see. Some things, like humans, are “sentient.” But Bing Sydney is “just” … you know. Actually, I don’t. What were you going to say? A machine? Software? A simulator? “Statistics?”
This has long driven me crazy. And I think you’re right about the source of the eagerness, although I suspect that mundanity is playing a role here, too. I suspect, in other words, that people often mistake the familiar for the understood—that no matter how strange some piece of reality is, if it happens frequently enough people come to find it normal; and hence, on some basic level, explained.
Like you, I have felt mesmerized by ctenophores at the Monterey Aquarium. I remember sitting there for an hour, staring at these curious creatures, watching their bioluminescent LED strips flicker as they gently floated in the blackness. It was so surreal. And every few minutes, this psychedelic experience would be interrupted by screaming children. Most of them would run up to the exhibit for a second, point, and then run on as their parents snapped a few pictures. Some would say “Mom, I’m bored, can we look at the otters?” And occasionally a couple would murmur to each other “That’s so weird.” But most people seemed unfazed.
I’ve been unfazed at times, too. And when I am, it’s usually because I’m rounding off my experience to known concepts. “Oh, a fish-type thing? I know what that’s like, moving on.” As if “fish-type thing” could encompass the piece of reality behind the glass. Whereas when I have these ethereal moments of wonder—this feeling of brushing up against something that’s too huge to hold—I am dropping all of that. And it floods in, the insanity of it all—that “I” am a thing, watching this strange, flickering creature in front of me, made out of similar substances and yet so wildly different. So gossamer, the jellies are—and containing, presumably, experience. What could that be like?
“Justs” are all too often a tribute to mundanity—the sentiment that the things around us are normal and hence, explained? And it’s so easy for things to seem normal when your experience of the world is smooth. I almost always feel myself mundane, for instance. Like a natural kind. I go to the store, call my friends, make dinner. All of it is so seamless—so regular, so simple—that it’s hard to believe any strangeness could be lurking beneath. But then, sometimes, the wonder catches me, and I remember how glaringly obvious it is that minds are the most fascinating phenomenon in the universe. I remember how insane it is—that some lumps of matter are capable of experience, of thought, of desire, of making reality bend to those desires. Are they? What does that mean? How could I be anything at all?

Minds are so weird. Not weird in the “things don’t add up to normality” way—they do. Just that, being a lump of matter like this is a deeply strange endeavor. And I fear that our familiarity with our selves blinds us to this fact. Just as it blinds us to how strange these new minds—this artificial Other, might be. And how tempting, it is, to take the thing that is too huge to hold and to paper over it with a “just” so that we may feel lighter. To mistake our blindness for understanding. How tragic a thing, to forego the wonder.

aysja 14 Jan 2024 6:42 UTC
31 points
22
on: Against most AI risk analogies
They establish almost nothing of importance about the behavior and workings of real AIs, but nonetheless give the impression of a model for how we should think about AIs.
How do you know that they establish nothing of importance?
Many proponents of AI risk seem happy to critique analogies when they don’t support the desired conclusion, such as the anthropomorphic analogy.
At the very least, this seems to go both ways. Like, afaict, one of Quintin and Nora’s main points in “AI is Easy to Control” is that aligning AI is pretty much just like aligning humans, with the exception that we (i.e., backpropagation) have full access to the weights which makes aligning AI easier. But is aligning a human pretty much like aligning an AI? Can we count on the AI to internalize our concepts in the same way? Do humans come with different priors that make them much easier to “align”? Is the dissimilarity “AI might be vastly more intelligent and powerful than us” not relevant at all, on this question? Etc. But I don’t see them putting much rigor into that analogy—it’s just something that they assume and then move on.
My point is that we should stop relying on analogies in the first place. Use detailed object-level arguments instead!

It seems reasonable, to me, to request more rigor when using analogies. It seems pretty wild to request that we stop relying on them altogether, almost as if you were asking us to stop thinking. Analogies seem so core to me when developing thought in novel domains, that it’s hard to imagine life without them. Yes, there are many ways AI might be. That doesn’t mean that our present world has nothing to say about it. E.g., I agree that evolution differs from ML in some meaningful ways. But it also seems like a mistake to completely throw out a major source of evidence we have about how intelligence was produced. Of course there will be differences. But no similarities? And do those similarities tell us nothing about the intelligences we might create? That seems like an exceedingly strong claim.

aysja 23 Dec 2023 0:24 UTC
30 points
17
in reply to: TurnTrout’s comment on: Ronny and Nate discuss what sorts of minds humanity is likely to find by Machine Learning
Fwiw, I generally find Quintin’s writing unclear and difficult to read (I bounce a lot) and Nora’s clear and easy, even though I agree with Quintin slightly more (although I disagree with both of them substantially).
I do think there is something to “views that are very different from ones own” being difficult to understand, sometimes, although I think this can be for a number of reasons. Like, for me at least, understanding someone with very different beliefs can be both time intensive and cognitively demanding—I usually have to sit down and iterate on “make up a hypothesis of what I think they’re saying, then go back and check if that’s right, update hypothesis, etc.” This process can take hours or days, as the cruxes tend to be deep and not immediately obvious.
Usually before I’ve spent significant time on understanding writing in this way, e.g. during the first few reads, I feel like I’m bouncing, or otherwise find myself wanting to leave. But I think the bouncing feeling is (in part) tracking that the disagreement is really pervasive and that I’m going to have to put in a bunch of effort if I actually want to understand it, rather than that I just don’t like that they disagree with me.
Because of this, I personally get a lot of value out of interacting with friends who have done the “translating it closer to my ontology” step—it reduces the understanding cost a lot for me, which tends to be higher the further from my worldview the writing is.

aysja 6 Mar 2024 20:52 UTC
28 points
0
in reply to: Ben Pace’s comment on: Vote on Anthropic Topics to Discuss
I assign >10% that Anthropic will at some point pause development for at least a year as a result of safety evaluations.

aysja 6 Mar 2024 1:38 UTC
28 points
28
in reply to: LawrenceC’s comment on: Anthropic release Claude 3, claims >GPT-4 Performance
If one of the effects of instituting a responsible scaling policy was that Anthropic moved from the stance of not meaningfully pushing the frontier to “it’s okay to push the frontier so long as we deem it safe,” this seems like a pretty important shift that was not well communicated. I, for one, did not interpret Anthropic’s RSP as a statement that they were now okay with advancing state of the art, nor did many others; I think that’s because the RSP did not make it clear that they were updating this position. Like, with hindsight I can see how the language in the RSP is consistent with pushing the frontier. But I think the language is also consistent with not pushing it. E.g., when I was operating under the assumption that Anthropic had committed to this, I interpreted the RSP as saying “we’re aiming to scale responsibly to the extent that we scale at all, which will remain at or behind the frontier.”
Attempting to be forthright about this would, imo, look like a clear explanation of Anthropic’s previous stance relative to the new one they were adopting, and their reasons for doing so. To the extent that they didn’t feel the need to do this, I worry that it’s because their previous stance was more of a vibe, and therefore non-binding. But if they were using that vibe to gain resources (funding, talent, etc.), then this seems quite bad to me. It shouldn’t both be the case that they benefit from ambiguity but then aren’t held to any of the consequences of breaking it. And indeed, this makes me pretty wary of other trust/deferral based support that people currently give to Anthropic. I think that if the RSP in fact indicates a departure from their previous stance of not meaningfully pushing the frontier, then this is a negative update about Anthropic holding to the spirit of their commitments.

aysja 12 Feb 2024 23:28 UTC
28 points
16
on: Natural abstractions are observer-dependent: a conversation with John Wentworth
we have some vague intuition that an abstraction like pressure will always be useful, because of some fundamental statistical property of reality (non-dependent on the macrostates we are trying to track), and that’s not quite true.
I do actually think this is basically true. It seems to me that when people encounter that maps are not the territory—see that macrostates are relative to our perceptual machinery or what have you—they sometimes assume that this means the territory is arbitrarily permissive of abstractions. But that seems wrong to me: the territory constrains what sorts of things maps are like. The idea of natural abstractions, imo, is to point a bit better at what this “territory constrains the map” thing is.
Like sure, you could make up some abstraction, some summary statistic like “the center point of America” which is just the point at which half of the population is on one side and half on the other (thanks to Dennett for this example). But that would be horrible, because it’s obviously not very joint-carvey. Where “joint carvy-ness” will end up being, I suspect, related to “gears that move the world,” i.e., the bits of the territory that can do surprisingly much, have surprisingly much reach, etc. (similar to the conserved information sense that John talks about). And I think that’s a territory property that minds pick up on, exploit, etc. That the directionality is shaped more like “territory to map,” rather than “map to territory.”
Another way to say it is that if you sampled from the space of all minds (whatever that space, um, is), anything trying to model the world would very likely end up at the concept “pressure.” (Although I don’t love this definition because I think it ends up placing too much emphasis on maps, when really I think pressure is more like a territory object, much more so than, e.g., the center point of America is).
There again I think the correct answer is the intentional stance: an agent is whatever is useful for me to model as intention-driven.
I think the intentional stance is not the right answer here, and we should be happy it’s not because it’s approximately the worst sort of knowledge possible. Not just behaviorist (i.e., not gears-level), but also subjective (relative to a map), and arbitrary (relative to my map). In any case, Dennett’s original intention with it was not to be the be-all end-all definition of agency. He was just trying to figure out where the “fact of the matter” resided. His conclusion: the predictive strategy. Not the agent itself, nor the map, but in this interaction between the two.

But Dennett, like me, finds this unsatisfying. The real juice is in the question of why the intentional stance works so well. And the answer to that is, I think, almost entirely a territory question. What is it about the territory, such that this predictive strategy works so well? After all, if one analyzes the world through the logic of the intentional stance, then everything is defined relative to a predictive strategy: oranges, chairs, oceans, planets. And certainly, we have maps. But it seems to me that the way science has proceeded in the past is to treat such objects as “out there” in a fundamental way, and that this has fared pretty well so far. I don’t see much reason to abandon it when it comes to agents.

I think a science of agency, to the extent it inherits the intentional stance, should focus not on defining agents this way, but on asking why it works at all.

aysja 7 Aug 2023 13:50 UTC
26 points
8
on: Yes, It’s Subjective, But Why All The Crabs?
I am not totally sure that I disagree with you, but I would not say that agency is subjective and I’m going to argue against that here.
Clarifying “subjectivity.” I’m not sure I disagree because of this sentence “there’s a certain structure out in the world which people recognize as X, because recognizing it as X is convergently instrumental for a wide variety of goals.” I’m guessing that where you’re going with this is that the reason it’s so instrumentally convergent is because there is in fact something “out there” that deserves to be labeled as X, irrespective of the minds looking at it? Like, the fact that we all agree that oranges are things is because oranges basically are things, e.g., they contain the molecules we need for energy, have rinds, and so on, and these are facts about the territory; denying that would be bad for a wide variety of goals because you’d be missing out on something instrumentally useful for many goals, where, importantly, “usefulness” is at least in part a territory property, e.g., whether or not the orange contains molecules that we can metabolize. If this is what you mean, then we don’t disagree. But I also wouldn’t call an orange subjective, in the same way I wouldn’t call agency subjective. More on that later.
People modeling things differently does not necessarily imply subjectivity. It seems like your main point about agents being subjective is that “different people model different things as agents at different times.” This doesn’t seem sufficient to me. Like, people modeled heat as different things before we knew what it was, e.g., there was a time when people were arguing about whether it was a motion, a gas, or a liquid. But heat turned out to be “objective,” i.e., something which seems to exist irrespective of how we model it. Likewise, before Darwin there was some confusion over what different dog breeds were: many people considered them to be different “varieties” which was basically just a word for “not different species, but still kind of different.” Darwin claimed, and I believe him, that people would give different answers about whether these were different species or merely different varieties based on context and their history (e.g., if a naturalist had never seen dogs, then they’d probably call them different species, if they had, they’d call them different varieties). As it turns out, there’s an underlying “objective” thing here, which is how much their genomes differ from each other (I think? Not an evolutionary biologist :p). In any case, it seems to me that it is often the case that before scientific concepts are totally sussed out there is disagreement over how to model the thing they are pointing at, but that this doesn’t on its own imply that it’s inherently subjective.
A potential crux. There is a further thing you might mean here, what Dennett calls “the indeterminacy of interpretation,” which is that there is just no fact of the matter to what is agentic. Like, people might have disagreed about what heat was for a while, but it turned out that heat is more-or-less objective. The concept “hot,” on the other hand, is more subjective: just a property of how the neurons in a particular body/mind are tuned. In other words, the answer to whether something is hot is basically just “mu”—there is no fact of the matter about it. I am guessing that you think agency is of the latter type; I think it is of the former, i.e., I think we just haven’t pinned down the concepts in agency well enough to all agree on them yet, but that there is something “actually there” which we are pointing at. This might be our crux?

Abstractions are not all subjective? I am generally pretty confused by the stance that all “high-level abstractions” are subjective (although I don’t know what you mean by “high-level.”) I think (based on citing Jaynes) that you are saying something like “abstractions are reflections of our own ignorance.” E.g., we talk about temperature as some abstract thing because we are uncertain about the particular microstate that underlies it. But it seems to me that if you take this stance then you have to call basically everything subjective, e.g., an orange sitting right in front of me is subjective because I am ignorant of its exact atomic makeup. This seems a little weird to me, like oranges wouldn’t go away if we became fully certain about them? Likewise, I don’t think agency goes away if we become less ignorant of it.
Agents are more like “oranges” than they are like “hot.” To me, agents seem clearly in the “orange” category, rather than the “hot” category. Sure, we might currently call different things agents at different times, but to me it seems clear that there is something “real” there that exists aside from our perceptual machinery/interpretation layer. Like, the fact that agents consume order (negentropy) from their environment to spend on the order they care about is one such example of something “objective-ish” about agents, i.e., a real regularity happening in the territory, not just relative to our models of it.

Why do we disagree about what’s agentic, then? On my model, part of the reason that people vary on what they call agentic is because (I suspect) “agency” is not going to be a coherent concept in itself, but rather break out into multiple concepts which all contribute to our sense of it, such that many things we currently consider to be edge cases can be explained by one or a few factors being missing (or diminished). Likewise, I do expect that it is not entirely categorical, but that things can have more or less of it, and have more or less at different times (i.e., that a particular human varies in its ‘agent-ness’ over time). Neither of these seem incongruent to me with the idea that it’s objective-ish, just that we haven’t clarified what we mean by agency yet.

aysja 10 Feb 2024 22:50 UTC
24 points
8
in reply to: So8res’s comment on: So8res’s Shortform
It seems like this is only directionally better if it’s true, and this is still an open question for me. Like, I buy that some of the commitments around securing weights are true, and that seems good. I’m way less sure that companies will in fact pause development pending their assessment of evaluations. And to the extent that they are not, in a meaningful sense, planning to pause, this seems quite bad. It seems potentially worse, to me, to have a structure legitimizing this decision and making it seem more responsible than it is, rather than just openly doing a reckless thing. Not only because it seems dishonest, but also because unambiguous behavior is easier for people to point at and hence to understand, to stop, etc.

I don’t want to stomp on hope, but I’d also like labs not to stomp out existence. AI companies are risking people’s lives without their consent—much more so than is remotely acceptable, with their estimated risk of extinction/catastrophe sometimes as high as 33%—this seems unacceptable to me. They should absolutely be getting pushback if their commitments are not up to par. Doing relatively better is not what matters.

aysja 2 Jan 2024 11:01 UTC
24 points
7
on: The Plan − 2023 Version
Goodheart is about generalization, not approximation.
This seems true, and I agree that there’s a philosophical hangup along the lines of “but everything above the lowest level is fuzzy and subjective, so all of our concepts are inevitably approximate.” I think there are a few things going on with this, but my guess is that part of what this is tracking—at least in the case of biology and AI—is something like “tons of tiny causal factors.”
Like, I think one of the reasons that optimizing hard for good grades as a proxy for doing well at a job fails is because “getting good grades” has many possible causes. One way is by being intelligent; others include cheating, memorizing a bunch of stuff, etc. I.e., there are many pathways to the target, and if you optimize hard for the target, then you get people putting more effort into the pathways they can meaningfully influence.
Approximate measures are bad, I think, when the measure is not in one-to-one lock step with the “real thing.” This happens with grades, but it isn’t true, for instance, in science when we understand the causal underpinnings of phenomena. I know exactly how to increase pressure in a box (increase the number of molecules, make it hotter, etc.), and there is no other secret causal factor which might achieve the same end.
The question, to me, is whether or not these sorts of clean causal structures (as we have with, e.g. “pressure”) exist in biological systems or AI. I suspect they do, and that we just have to find the right way of understanding them. But I think part of the reason people expect these domains to be hopelessly messy is because their behavior seems to be mostly determined by tons of tiny casual factors (chemical signals, sequence of specific neurons firing, etc.)… Like, I read this biology paper once where the authors hilariously lamented that their chart of signaling cascades was a “horror graph” where “everything does everything to everything.”
Indeed, the underlying structure “everything does everything to everything” does seem less conducive to being precisely conceptualized. Especially when (imo) ideal scientific understanding consists of finding the few, isolated causal factors which produce an effect (as in the case of “pressure”). But if everything we care about—abstractions, agency, goals, etc.—are coarse-grainings over billions of semi-independent causal factors, produced in path-dependent ways, then the idea of finding concepts as precise as “pressure” starts looking kind of hopeless, I think.
Like you, I suspect that this is mostly an ontology issue (i.e., that neurons/etc are not necessarily the natural units), and I do think that precise concepts of agency/abstractions/etc both exist and are findable. Partially this is an empirical claim—as you note, we do actually find surprisingly clean structures in biological systems (such as modularity). But also, my proposed angle against “fractal complexity” is something like: I expect life to be understandable because it needs to control itself. I’m not going to give the full argument here, but to speak in my own mentalese for a paragraph:
When you get large, directed systems—(e.g., we are composed of 40 trillion cells, each containing tens of millions of proteins)—I think you basically need some level of modularity if there’s any hope of steering the whole thing. E.g., I think one of the reasons we see modular structure across all levels in biology is because modular structures are easier to control (the input/output structure is much cleaner than if it were some messy clump where “everything does everything to everything”). If my brain had to figure out how to navigate some hyper complex horror graph with a gazillion tiny causal factors every time I wanted to move my hand, I wouldn’t get very far. There’s more argument here to totally spell this out—like why that can’t all be pre-computed, etc.—but I think the fact that biological organisms can reliably coordinate their activity in flexible/general ways is evidence of there being clean structure.

aysja 16 Feb 2023 7:01 UTC
20 points
5
on: My understanding of Anthropic strategy
Thanks for writing this up! It seems very helpful to have open, thoughtful discussions about different strategies in this space.
Here is my summary of Anthropic’s plan, given what you’ve described (let me know if it seems off):
1. It seems likely that deep learning is what gets us to AGI.
2. We don’t really understand deep learning systems, so we should probably try to, you know, do that.
3. In the absence of a deep understanding, the best way to get information (and hopefully eventually a theory) is to run experiments on these systems.
4. We focus on current systems because we think that the behavior they exhibit will be a factor in future systems.
Leaving aside concerns about arms races and big models being scary in and of themselves, this seems like a pretty reasonable approach to me. In particular, I’m pretty on board with points 1, 2, and 3—i.e., if you don’t have theories, then getting your feet wet with the actual systems, observing them, experimenting, tinkering, and so on, seems like a pretty good way to eventually figure out what’s going on with the systems in a more formal/mechanistic way.
I think the part I have trouble with (which might stem from me just not knowing the relevant stuff) is point 4. Why do you need to do all of this on current models? I can see arguments for this, for instance, perhaps certain behaviors emerge in large models that aren’t present in smaller ones. But I’ve never seen, e.g., a list of such things and why they are important or cruxy enough to justify the emphasis on large models given the risks involved. I would really like to see such an argument! (Perhaps it does exist and I am not aware).
I also have a bit of trouble with the “top player” framing—at the moment I just don’t see why this is necessary. I understand that Anthropic works on large models, and that this is on par with what other “top players” in the field are doing. But why not just say that you want to work with large models? Why mention being competitive with Deepmind or OpenAI at all? The emphasis on “top player” makes me think that something is left unsaid about the motivation, aside from the emphasis on current systems. To the extent that this is true, I wish it were stated explicitly. (To be clear, “you” means Anthropic, not Miranda).

aysja

Why Are Bac­te­ria So Sim­ple?

Why Are Bacteria So Simple?