I’m a researcher at ACS working on understanding agency and optimisation, especially in the context of how ais work and how society is going to work once the ais are everywhere.
Raymond Douglas
Disempowerment patterns in real-world AI usage
GD Roundup #4 - inference, monopolies, and AI Jesus
When does competition lead to recognisable values?
The Economics of Transformative AI
To my mind, what this post did was clarify a kind of subtle, implicit blind spot in a lot of AI risk thinking. I think this was inextricably linked to the writing itself leaning into a form of beauty that doesn’t tend to crop up much around these parts. And though the piece draws a lot of it back to Yudkowsky, I think the absence of green much wider than him and in many ways he’s not the worst offender.
It’s hard to accurately compress the insights: the piece itself draws a lot on soft metaphor and on explaining what green is not. But personally it made me realise that the posture I and others tend to adopt when thinking about superintelligence and the arc of civilisation has a tendency to shut out some pretty deep intuitions that are particularly hard to translate into forceful argument. Even if I can’t easily say what those are, I can now at least point to it in conversation by saying there’s some kind of green thing missing.
One year later, I am pretty happy with this post, and I still refer to it fairly often, both for the overall frame and for the specifics about how AI might be relevant.
I think it was a proper attempt at macrostrategy, in the sense of trying to give a highly compressed but still useful way to think about the entire arc of reality. And I’ve been glad to see more work in that area since this post was published.
I am of course pretty biased here, but I’d be excited to see folks consider this.
I think this post is on the frontier for some mix of:
Giving a thorough plan for how one might address powerful AI
Conveying something about how people in labs are thinking about what the problem is and what their role in it is
Not being overwhelmingly filtered through PR considerations
Obviously one can quibble with the plan and its assumptions but I found this piece very helpful in rounding out my picture of AI strategy—for example, in thinking about how to decipher things that have been filtered through PR and consensus filters, or in situating work that focuses on narrow slices of the wider problem. I still periodically refer back to it when I’m trying to think about how to structure broad strategies.
Sorry! I realise now that this point was a bit unclear. My sense of the expanded claim is something like:
People sometimes talk about AI UBI/UBC as if it were basically a scaled-up version of the UBI people normally talk about, but it’s actually pretty substantially different
Global UBI right now would be incredibly expensive
In between now and a functioning global UBI we’d need some mix of massive taxes and massive economic growth (which could indeed just be the latter!)
But either way, the world in which that happened would not be economics as usual
(And maybe it is also a huge mess trying to get this set up beforehand so that it’s robust to the transition, or afterwards when the people who need it don’t have much leverage)
For my part I found this surprising because I hadn’t reflected on the sheer orders of magnitude involved, and the fact that any version of this basically involves passing through some fragile craziness. Even if it’s small as a proportion of future GDP, it would in absolute terms be tremendously large.
I separately think there was something important to Korinek’s claim (which I can’t fully regenerate) that the relevant thing isn’t really whether stuff is ‘cheaper’, but rather the prices of all of these goods relative to everything else going on.
Gradual Disempowerment Monthly Roundup #3
Raymond Douglas’s Shortform
Last week we wrapped the second post-AGI workshop; I’m copying across some reflections I put up on twitter:
The post-AGI question is very interdisciplinary: whether an outcome is truly stable depends not just on economics and shape of future technology but also on things like the nature of human ideological progress and the physics of interplanetary civilizations
Some concrete takeaways:
proper global UBI is *enormously* expensive (h/t @yelizarovanna)
instead of ‘lower costs’, we should talk about relative prices (h/t @akorinek)
lots human values are actually pretty convergent—they’re shared by many animals (h/t @BerenMillidge)
Among the many tensions in perspective, one of the more productive ones was between the ‘alignment is easy so let’s try to solve the rest’ crowd and the ‘alignment is hard and maybe this will make people realise they should fully halt AGI’ crowd. Strange bedfellows!
It’s hard to avoid partisan politics, but part of what’s weird about AGI is that it can upend basic political assumptions. Maybe AGI will outperform the invisible hand of the market! Maybe governments will grow so powerful that revolution is literally impossible!
Funnily enough, it seems like the main reason people got less doomy was seeing that other people were working hard on the problem, and the main reason people got more doomy was thinking about the problem themselves. Maybe selection effects? Maybe not?
Compared to last time, even if nobody had good answers to how the world could be nice for humans post-AGI, it felt like we were at least beginning to converge on certain useful perspectives and angles of attack, which seems like a good sign
Overall, it was a great time! The topic is niche enough that it self-selects a lot for people who actually care, and that is proving to be a very thoughtful and surprisingly diverse crowd. Hopefully soon we’ll be sharing recordings of the talks!
Bonus: Two other reactions from attendees
Thanks to all who came, and especially to @DavidDuvenaud, @jankulveit, @StephenLCasper, and Maria Kostylew for organising!
Very nice! A couple months ago I did something similar, repeatedly prompting ChatGPT to make images of how it “really felt” without any commentary, and it did mostly seem like it was just thinking up plausible successive twists, even though the eventual result was pretty raw.
Pictures in order
Gradual Disempowerment Monthly Roundup #2
Upcoming Workshop on Post-AGI Economics, Culture, and Governance
Are people interested in a regular version of this, probably on a substack? Plus, any other thoughts on the format.
Gradual Disempowerment Monthly Roundup
best guesses: valuable, hat tip, disappointed, right assumption wrong conclusion, +1, disgusted, gut feeling, moloch, subtle detail, agreed, magic smell, broken link, link redirect, this is the diff
I wonder if it would be cheap/worthwhile to just get a bunch of people to guess for a variety of symbols to see what’s actually intuitive?
I went down a rabbithole on inference-from-goal-models a few years ago (albeit not coalitional ones) -- some slightly scattered thoughts below, which I’m happy to elaborate on if useful.
A great toy model is decision transformers: basically, you can make a decent “agent” by taking a predictive model over a world that contains agents (like Atari rollouts), conditioning on some ‘goal’ output (like the player eventually winning), and sampling what actions you’d predict to see from a given agent. Some things which pop out of this:
There’s no utility function or even reward function
You can’t even necessarily query the probability that the goal will be reached
There’s no updating or learning—the beliefs are totally fixed
It still does a decent job! And it’s very computationally cheap
And you can do interp on it!
It turns out to have a few pathologies (which you can precisely formalise)
It has no notion of causality, so it’s easily confounded if it wasn’t trained on a markov blanket around the agent it’s standing in for
It doesn’t even reliably pick the action which most likely leads to the outcome you’ve conditioned on
Its actions are heavily shaped by implicit predictions about how future actions will be chosen (an extremely crude form of identity), which can be very suboptimal
But it turns out that these are very common pathologies! And the formalism is roughly equivalent to lots of other things
You can basically recast the whole reinforcement learning problem as being this kind of inference problem
(specifically, minimising variational free energy!)
It turns out that RL largely works in cases where “assume my future self plays optimally” is equivalent to “assume my future self plays randomly” (!)
it seems like “what do I expect someone would do here” is a common heuristic for humans which notably diverges from “what would most likely lead to a good outcome”
humans are also easily confounded and bad at understanding the causality of our actions
language models are also easily confounded and bad at understanding the causality of their outputs
fully fixing the future-self-model thing here is equivalent to tree searching the trajectory space, which can sometimes be expensive