The examples that you gave of drama don’t seem like particularly interesting things to me, and seem similar to the lowest grade forms of entertainment that we have today, if anything maybe less. I think art and cultural entertainment today are some of the highest forms, and those do exist in the culture, but a big part of the appeal of art is that has teeth, that it actually criticises something. This isn’t shown to be much the case in the Culture.
But moreover, I can easily imagine worlds that are vastly more interesting even within the constraints of the Culture itself that it doesn’t explore. For example, each GSV could represent completely different worlds of experience to its inhabitants, plus a vastly richer digital realm, packed with people competing over invented metrics such as what we see in MMOs today, or people creating vast LARP battles, or recreating interesting moments and scenes from history, or engaging in decades long simulated games with things like simulated magic facilitated by advanced technology.
This is just what I can think of in a few minutes, but there should be enormous numbers more kinds of different experiences available, as well as quite strong convergence to particular sets of values within the Culture, such that the final result seems overall somewhat less diverse even than the modern United States, letalone modern Earth as a whole today.
Sturb
Contributing to Technical Research in the AI Safety End Game
Sanity-checking “Incompressible Knowledge Probes”
I feel like I honestly diverge with Iain M. Banks’ model of people’s minds regarding the self termination question. I think there is massive inertia wrt to taking the action to die which would mean many people would just continue by default.
It’s possible that there’s some worldbuilding element that I missed regarding this.
The Great Smoothing Out
How Does an Agent with Multiple Goals Choose a Target?
The Garden
Thanks for this, this is a great post. I just wanted to flag some of the figures have rendered to be very blurry, to the point of being unreadable. Would be worth updating these!
At first it seemed more defensible if one considered the West to only consist of the US, but even this is completely inaccurate given that crime rates in US cities have plummeted to historic lows in the last 3 years.
https://archive.is/OO3bE
”According to Asher’s analysis, Detroit, San Francisco, Chicago, Newark, and a handful of other big cities recorded their lowest murder rates since the 1950s and ’60s. “Our cities are as safe as they’ve ever been in the history of the country,” Patrick Sharkey, a sociologist at Princeton who studies urban violence, told me.”
Yep, follow up surveys seem valuable!
I would agree that as much as I enjoyed doing ARENA and think the materials are very high quality, most of the really valuable stuff could be compressed into a week (the content on learning how transformers work and a few parts of the mech interp material are significantly more important than the rest), and the mental training from solving specific problems has not transferred into doing better research for me to a significant extent. Research on transfer learning across domains has also shown remarkably poor results in general.
This might sound odd because it might seem like the pedagogy of ARENA is really strong, because the program is very hard, the material is very technical, it’s made by really smart people, and really smart, capable people go through the program and benefit. Doing the program is almost certainly a significantly better counterfactual use of time than not doing the program for most AI safety people.
However, I think a lot of the benefits stem from being able to spend time around those sorts of people and engaging in research in a collaborative environment. I don’t think a great deal of the benefit is derived from the pedagogy itself, because for most people, working through the notebooks doesn’t meaningfully translate much to research. I think this is doubly true in the age of agentic coding where a lot of the nitty gritty details that are the bulk of the notebooks can be relatively safely abstracted away.
I can think of many excellent, highly technical interpretability researchers who never did anything like the material covered in the notebooks, but who are producing excellent research today. The really hard parts of research are developing taste about the field, having a sense of what good experimental design looks like, asking the right questions and sniffing out how to extract the most information, and to some extent the technical skills associated with writing code. ARENA primarily aims to fix the last, which is also the part most exposed to being addressed by agentic coding (of course having the underlying knowledge is important, but doing the research and reflecting with others will also impart this knowledge).
The most valuable things I received from ARENA were the confidence to pursue research, the validation of being accepted into the program, the relationships I developed with other people during the program, and the exposure to the people/environment at LISA. If the program shifted to help expose people to quickly forming teams, working together, developing an interesting question, and getting a deeper sense of a particular area through a research sprint, this would cover more of the key skills required to do research, provide all the key benefits listed above, and allow for technical exposure and upskilling. I am less certain of scrapping the in-person program, as maybe it could function in exactly the same way at first.
I could see this modified program being similarly valuable to me today, having already completed MATS, as to someone starting out much earlier. I would also probably be more likely to recommend someone starting out in the field to participate in such a program.
I prefer just in time learning over just in case learning because it’s much more time efficient. Developing the core skills of research seem more likely to serve a young researcher well, and they’d also benefit more from the friendships and collaborations, and potentially have interesting threads to pull on after the program ended. If they desperately need to pick up the skills from one of the ARENA weeks, they could presumably pick up the core parts in around a week.
I think I am less interested in the pain a WBE would experience and more the valence of the experiences that it has, for example whether or not it is sad or happy etc.
I do find your point very interesting of how its views would diverge over time, because its awareness of its own nature would definitely impact how it relates to reality. For example, the things that would affect it would likely primarily be things happening in the outside world, as it can mostly discount a large portion of the experiences that it would have in its simulated world in terms of how they would impact its emotions.
I suppose this would lean the moral relevance towards the preferences that the whole brain emulation held about the outside world and its inner world.
Whole Brain Emulation as an Anchor for AI Welfare
Announcing the Cooperative AI Research Fellowship
The most powerful wizard I’ve met recently was a guy at the University of Cape Town who could just seemingly build almost anything. He worked on a crazy machine that was used to simulate the crystalline structure of metals for industrial processes at different stages to ensure it could meet spec.
He has a side business creating these world class knives through an autoCAD system using incredibly high quality steel https://www.maxwellvosdesigns.com/.
His next project was to a create a 100x cheaper than standard electron microscope to see if it could be spun out into a productive enterprise.
I’ve rarely met someone who had such amazing ambitions matched with such a fine ability to execute. Truly inspirational.
Thus most decisions will probably be allocated to AI systems
If AI systems make most decisions, humans will lose control of the future
If humans have no control of the future, the future will probably be bad for humans
Sure—at some point in the future, maybe.
Maybe, maybe not. Humans tend to have a bit of an ego when it comes to letting a filthy machine make decisions for them. But I’ll bite the bullet.
There’s several levels on which I disagree here. Firstly, we’re assuming that “humans” have control of the future in the first place. It’s hard to assign coherent agency to humanity as a whole, it’s more of a weird mess of conflicting incentives, and nobody really controls it. Secondly, if those AI systems are designed in the right way, the might just become the tools for humanity to sorta steer the future the way we want it.
I agree with your framing here that systems made up of rules + humans + various technological infrastructure are the actual things that control the future. But I think the key is that the systems themselves would begin to favour more non-human decision making because of incentive structures.
Eg, corporate entities have a profit incentive to have the most efficient decision maker in charge of the company, and maybe that includes a CEO but the board might insist on the use of an AI assistant for that CEO, and if the CEO makes a decision that goes against the AI and it turns out to be wrong shareholders in that company will come to trust the AI system more and more of the time. They don’t necessarily care about the ego of the CEO they just care about the outcomes, within the competitive market.
In this way, more and more decision making gets turned over to non-human systems because of the competitive structures which are very difficult to escape from. As this transition continues it becomes very hard to control the unseen externalities from these decisions.
I suppose this doesn’t seem too catastrophic in its fundamental form, but I think the outcomes of playing it forward essentially seem to be a significant potential for harm from these externalities, without much of a mechanism for recourse.
Actually, as far as I know, this is wrong. He simply hasn’t been back to the offices but has been working remotely.
This article goes into some detail and seems quite good.
I think that the key is in the way that preferences inform our world model and thus what causes the prediction error to occur. There are errors you would observe that would strongly indicate that your preferences are less able to be met in the posterior model. This will cause suffering whereas an update towards a model in which your needs are met more easily is likely to cause a good feeling. For example, you sit down to eat a sandwich at Subway for the first time and the sub is actually way better than you expected. You will experience a pleasant feeling, and if things like this keep happening you might feel like you’ve really figured out some good strategy for operating.
In a sense you are actually decreasing prediction error more than you are increasing it when a good thing happens to you because you always generate prediction error based on the difference between your ideal world and your observed reality. So when you have a very positive experience, this error between the ideal and observed is lessened. This could outweigh the prediction error of the prediction itself being wrong. The example I think of for this is the ecstatic child in Disney world.
There might be more work here though.
I think the main thing I was gesturing at in the post is that traditions can die and be lost beyond recovery, and when they are gone we accept that loss the same way that we accept the deaths of people from the past.
In the same way, that it’s sad when someone dies unnecessarily today, it would be sad for certain traditions that we have today to be lost. We also close the door on evolutions of those traditions and experiences downstream from those things that lead to valuable mind states.
I am less concerned about traditions that are dropped by lack of interest in the far future because the juice isn’t worth the squeeze.