AI Safety person currently working on multi-agent coordination problems.
Jonas Hallgren
I’ve got a bunch of meditation under my belt so my metacognitive awareness is quite good imo.
Stimulants that are attention increasing such as caffiene or modafinil generally lead to more tunnelvision and less metacognitive awareness in my experience. This generally leads to less ability to update opinions quickly.
Nicotine that activates acetylcholine receptors allow for more curiosity which allow me to update more quickly so it is dependent on the stimulant as well as the generak timing. (0.6mg in gum form, too high spike just leads to a hit and not curiosity). It is like being more sensitive and interested in whatever appears around me
If you’re sensitive enough you can start recognizing when different mental modes are firing in your brain and adapt based on what you want, shit is pretty cool.
One of the more common responses I hear at this point is some variation of “general intelligence isn’t A Thing, people just learn a giant pile of specialized heuristics via iteration and memetic spread
I’m very uncertain about the validity of the below question but I shalt ask it anyway and since I don’t trust my own way of expressing it, here’s claude on it:
The post argues that humans must have some general intelligence capability beyond just learning specialized heuristics, based on efficiency arguments in high-dimensional environments. However, research on cultural evolution (e.g., “The Secret of Our Success”, “Cognitive Gadgets”) suggests that much of human capability comes from distributed cultural learning and adaptation. Couldn’t this cultural scaffolding, combined with domain-specific inductive biases (as suggested by work in Geometric Deep Learning), provide the efficiency gains you attribute to general intelligence? In other words, perhaps the efficiency comes not from individual general intelligence, but from the collective accumulation and transmission of specialized cognitive tools?
I do agree that there are specific generalised forms of intelligence, I guess this more points me towards that the generating functions of these might not be optimally sub-divided in the usual way we think about it?Now completely theoretically of course, say someone where to believe the above, why is the following really stupid?:
Specifically, consider the following proposal: Instead of trying to directly align individual agents’ objectives, we could focus on creating environmental conditions and incentive structures that naturally promote collaborative behavior. The idea being that just as virtue ethics suggests developing good character through practiced habits and environmental shaping, we might achieve alignment through carefully designed collective dynamics that encourage beneficial emergent behaviors. (Since this seems to be the most agentic underlying process that we currently have, theoretically of course.)
Well said. I think that research fleets will be a big thing going forward and you expressed why quite well.
I think there’s an extension that we also have to make with some of the safety work we have, especially for control and related agendas. It is to some extent about aligning research fleets and not individual agents.
I’ve been researching ways of going about aligning & setting up these sorts of systems for the last year but I find myself very bottlenecked by not being able to communicate the theories that exists in related fields that well.
It is quite likely that RSI happens in lab automation and distributed labs before anything else. So the question then becomes how one can extend the existing techniques and theory that we currently have to distributed systems of research agents?
There’s a bunch of fun and very interesting decentralised coordination schemes and technologies one can use from fields such as digital democracy and collective intelligence. It is just really hard to prune what will work and to think about what the alignment proposals should be for these things. You usually have emergence which for Agent-Based Models which research systems are a sub-part of and often the best way to predict problems is to actually run the experiments in those systems.
So how in the hell are we supposed to predict the problems without this? What are the experiments we need to run? What types of organisation & control systems should be recommended to governance people when it comes to research fleets?
The Alignment Mapping Program: Forging Independent Thinkers in AI Safety—A Pilot Retrospective
This delightful piece applies thermodynamic principles to ethics in a way I haven’t seen before. By framing the classic “Ones Who Walk Away from Omelas” through free energy minimization, the author gives us a fresh mathematical lens for examining value trade-offs and population ethics.
What makes this post special isn’t just its technical contribution—though modeling ethical temperature as a parameter for equality vs total wellbeing is quite clever. The phase diagram showing different “walk away” regions bridges the gap between mathematical precision and moral intuition in an elegant way.
While I don’t think we’ll be using ethodynamics to make real-world policy decisions anytime soon, this kind of playful-yet-rigorous exploration helps build better abstractions for thinking about ethics. It’s the kind of creative modeling that could inspire novel approaches to value learning and population ethics.
Also, it’s just a super fun read. A great quote from the conclusion is “I have striven to make this paper a pleasant read by enriching it with all manners of enjoyable things: wit, calculus, and a non indifferent amount of imaginary child abuse”.
That is the type of writing I want to see more of! Very nice.
I really like this! For me it somewhat also paints a vision for what could be which might inspire action.
Something that I’ve generally thought would be really nice to have over the last couple of years is a vision for how an AI Safety field that is decentralized could look like and what the specific levers to pull would be to get there.
What does the optimal form of a decentralized AI Safety science look like?How does this incorporate parts of meta science and potentially decentralized science?
How does this look like with literature review from AI systems? How can we use AI Systems in themselves to create such infrastructure in the field? How do such communication pathways optimally look like?
I feel that there are so many low-hanging fruit here. There are so many algorithms that we could apply to make things better. Yes we’ve got some forums but holy smokes could the underlying distribution and optimisation systems be optimised. Maybe the lightcone crew could cook something in this direction?
Meditation insights as phase shifts in your self-model
Let me drop some examples of “theory” or at least useful bits of information that I find interesting beyond the morphogenesis and free energy principle vibing. I agree with you that basic form of FEP is just another formalization of bayesian network passing formalised through KL-divergence and whilst interesting it doesn’t say that much about foundations. For Artificial Life, it is more a vibe check from having talked to people in the space, it seems to me they’ve got a bunch of thoughts about it but it seems like they’ve got some academic capture so it might be useful to at least talk to the researchers there about your work?
Like a randomly insultingly simple suggestion: Do a quick literature review through elicit in ActInf and Computational Biology for your open questions and see if there’s links, if there are send those people a quick message. I think a bunch of the theory is in people’s heads and if you nerdsnipe them they’re usually happy to give you the time of day.
Here’s some stuff that I think is theoretically cool as a quick sampler:
For Levin’s work:
In the link I posted above he talks about morphogenesis, the thing I find the most interesting there from an agent foundations and information processing perspective is the anti-fragility of systems with respect to information loss (similar to some of the stuff in Uri’s work if I’ve understood that correctly.) There are lots of variations of underlying genetics yet similar structures can be decoded through similar algorithms and it just shows a huge resillience there. It seems you probably know this from Uri’s work already
Active Inference stuff:
Physics as information processing: https://youtu.be/RpOrRw4EhTo
The reason why I find this very interesting is that it seems to me to be saying something fundamental about information processing systems from a limited observer perspective.
I haven’t gotten through the entire series yet but it is like a derivation of hierarchical agency or at least why a controller is needed from first principles.
I think this ACS post explains it better than I do below but here’s my attempt at it:
I’m trying to find the stuff I’ve seen on <<Boundaries>> within Active Inference yet it is spread out and not really centered. There’s this very interesting perspective of there only being model and modelled and that talking about agent foundations is a bit like taking the modeller as the foundational perspective whilst that is a model in itself. Some kind of computational intractability claims together with the above video series gets you to this place where we have a system of hierarchical agents and controllers in a system with each other. I have a hard time explaining it but it is like it points towards a fundamental symmetry perspective between an agent and it’s environment.
Other videos from Levin’s channel:
Agency at the very bottom—some category theory mathy stuff on agents and their fundamental properties: https://youtu.be/1tT0pFAE36c
The Collective Intelligence of Morphogenesis—if I remember correctly it goes through some theories around cognition of cells, there’s stuff about memory, cognitive lightcones etc. I at least found it interesting: https://youtu.be/JAQFO4g7UY8
(I’ve got that book from URI on my reading list btw, reminded me of this book on Categorical systems theory, might be interesting: http://davidjaz.com/Papers/DynamicalBook.pdf)
In your MATS training program from two years ago, you talked about farming bits of information from real world examples before doing anything else as a fast way to get feedback. You then extended this to say that this is quicker than doing it with something like running experiments.
My question is then why you haven’t engaged your natural latentes or what in my head I think of as a “boundary formulation through a functional” with fields such as artificial life or computational biology where these are core questions to answer?
Trying to solve image generation or trying to solve something like fluid mechanics simulations seem a bit like doing the experiment before trying to integrate it with the theory in that field? Wouldn’t it make more sense to try to engage in a deeper way with the existing agent foundations theory in the real world like Michael Levin’s Morphogenesis stuff? Or something like an overview of Artificial Life?
Yes as you say real world feedback loops and working on real world problems, I fully agree but are you sure that you’re done with the problem space exploration? Like these fields already have a bunch of bits on crossing the theory practice gap. You’re trying to cross it by applying the theory in practice yet if that’s the hardest part wouldn’t it make sense to sample from a place that already has done that?
If I’m wrong here, I should probably change my approach so I appreciate any insight you might have.
I love your stuff and I’m very excited to see where you go next.
I would be very curious to hear what you have to say about more multi-polar threat scenarios and extending theories of agency into the collective intelligence frame.What are your takes on Michael Levin’s work on agency and “morphologenesis” in relation to your neuroscience ideas? What do you think about claims of hierarchical extension of these models? How does this affect multipolar threat models? What are the fundamental processes that we should care about? When should we expand these concepts cognitively, when should we constrain them?
I resonate with this framing of evolution as an optimizer and I think we can extend this perspective even further.
Evolution optimizes for genetic fitness, yes. But simultaneously, cultural systems optimize for memetic fitness, markets optimize for economic fitness, and technological systems increasingly optimize for their own forms of fitness. Each layer creates selection pressures that ripple through the others in complex feedback loops. It isn’t necessarily that evolution is the only thing happening, it may be the outermost value function that exists but there’s so much nesting here as well.
There’s only modelling and what is being modelled and these things are happening everywhere all at once. I feel like I fully agree with what you said but I guess for me an interesting point is about what basis to look at it from.
Randomly read this comment and I really enjoyed it, Turn it into a post? (I understand how annoying structuring complex thoughts coherently can be but maybe do a dialogue or something? I liked this.)
I largely agree with a lot of the missing things in people’s views of utility functions and so I think you expressed some of that in a pretty good deeper way.
When we get into acausality and evertt branches I think we’re going a bit off-track. I can think computational intractability and observer bias is something interesting to bring up but I always find it never leads anywhere. Quantum Mechanics is fundamentally observer invariant and so positing something like MWI is a philosophical stance (that is supported by occam’s razor) but it is still observer dependent, what if there are no observers?
(Pointing at Physics as Information Processing)
Do you have any specific reason why you’re going into QMech when talking about brain-like AGI stuff?
Most of the time, the most high value conversations aren’t fully spontaneous for me but they’re rather on open questions that I’ve already prepped beforehand. They can still be very casual, it is just that I’m gathering info in the background.
I usually check out the papers submitted or the participants if it’s based on swapcard and do some research beforehand on what people I want to meet. Then I usually have some good opener that leads to some interesting conversations. These conversations can be very casual and can span wide areas but I feel I’m building a relationship with an interesting individual and that’s really the main benefit for me.
At the latest ICML, I talked to a bunch of interesting multi-agent researchers through this method and I now have people I can ask stupid questions.
I also always come to conferences with one or more specific projects that I want advice on which makes these conversations a lot easier to have.
Extremely long chain of thought, no?
Yes, problems, yes, people are being really stupid, yes, inner alignment and all of it’s cousins are really hard to solve. We’re generally a bit fucked, I agree. The brickwall is so high we can’t see the edge and we have to bash out each brick one at a time and it is hard, really hard.
I get it people, and yet we’ve got a shot, don’t we? The probability distribution of all potential futures is being dragged towards better futures because of the work you put in and I’m very grateful for that.
Like, I don’t know how much credit to give LW and the alignment community for the spread of alignment and AI Safety as an idea but we’ve literally go tnoble prize winners talking about this shit now. Think back 4 years, what the fuck? How did this happen? 2019 → 2024 has been an absolutely insane amount of change in the world especially from an AI Safety perspective.How do we have over 4 AI Safety Institutes in the world? It’s genuinely mindboggling to me and I’m deeply impressed and inspired, which I think that you also should be.
I just saw a post from AI Digest on a Self-Awareness benchmark and I just thought, “holy fuck, I’m so happy someone is on top of this”.
I noticed a deep gratitude for the alignment community for taking this problem so seriously. I personally see many good futures but that’s to some extent built on the trust I have in this community. I’m generally incredibly impressed by the rigorous standards of thinking, and the amount of work that’s been produced.
When I was a teenager I wanted to join a community of people who worked their ass off in order to make sure humanity survived into a future in space and I’m very happy I found it.
So thank every single one of you working on this problem for giving us a shot at making it.
(I feel a bit cheesy for posting this but I want to see more gratitude in the world and I noticed it as a genuine feeling so I felt fuck it, let’s thank these awesome people for their work.)
Could someone please safety pill the onion? I think satire is the best way to deal with people being really stupid and so I want more of this as an argument when talking with the e/acc gang: https://youtu.be/s-BducXBSNY?si=j5f8hNeYFlBiWzDD
(Also if they already have some AI stuff, feel free to link that too)
I guess the solution that you’re more generally pointing at here is something like ensuring a split in the incentives of the people within the specific fields and EA itself as a movement. Almost a bit like making that part of EA only be global priorities research and something like market allocation?
I have this feeling that there might be other ways to go about doing this with like programs or incentives for making people be more open to taking any type of impactful job? Something like having reoccuring reflection periods or other types of workshops/programs?
Good post, did you also cross post to the forum? Also do you have any thoughts on what to do differently in order to enable more exploration and less lock in?
So I still haven’t really figured out how to talk about these things properly as it is more of a vibe than it is an intellectual truth?
Let’s say that you don’t feel a strong sense of self but that you’re instead identified with nothing, there is no self, if you see this then you can see the “deathless”.
It’s pointing out a different metaphysical viewpoint that can be experienced. I agree with you that from a rational point of view this is strictly not true yet it isn’t to be understood, it is to be experienced? You can’t or at least I can’t think my way to it.