The AFFINE Superintelligence Alignment Seminar – A Retrospective

A Day at AFFINE[1]

“AFFINE was the best month of intellectual exploration I have had the opportunity to engage in, ever. Usually opportunities like this are limited to a day or a weekend, which both limits depth, forces a sprint-type mindset, and generally is quite limiting. At AFFINE I had time to wander towards and through interesting ideas.”

-Xylix (participant)

You wake up in an ornate room shared with a few other participants to the smell of breakfast, or perhaps you have been up for a while, reading or going on a morning run. You grab what you want from the buffet and head to the common room which is slowly filling up and fragmenting into conversations of various sizes and scopes. Zipf distributions came up yesterday and the knowledge applies. Sunlight filters in through the embroidered curtains as you join a group talking about the self organized criticality of brains and why it is necessary. The people one couch over are designing an experiment to settle their bet about whether Claude Code charges token-use for cached reads. Off-handedly you pitch a talk you want to give on the unconference day tomorrow, and the group seems generally interested. A fellow participant wants to work with you on it and you happily agree.

During the day you attend a talk by Ihor Kendiukhov about problems with expected utility theory, followed by a remote presentation by Abram Demski, taxonomizing Goodhart’s law. You ask a question about quantilizers and check the app to see whether anyone wants to explain geometric rationality to you. Turns out yes! You schedule a session for tomorrow before heading outside for a few games of volleyball while your default mode network processes the information. You grab one of the mentors for a 1on1 about research methodology, before dinner is served.

Originally you wanted to read a couple of papers but instead you get roped into an argument about active inference and embeddedness when you carelessly walk past a whiteboard. The evening is spent playing increasingly esoteric variants of chess until your brain finally gives out. You head to your room, briefly consider whether you might want to try one of the tents outside over the next few days, and pass out almost instantly.

One month ago, we held the first AFFINE alignment seminar. A peer-tutoring-driven, intensive retreat, focused on frame-finding and the deep, fundamental difficulties of the alignment problem. The following is a narrative account of how it went, what we learned, and how we intend to proceed from this point onward. It is optimized for readability and conveying the spirit of our event. For dry information click here.

a whiteboard session in the beautiful czech countryside

Missing Foundations

“I feel there is extreme lack of events like this in the AI safety community [...]. AFFINE Superintelligence Seminar is a venue where real technical competence, moral seriousness, and productive vibes converge. For people who deeply and honestly care about the alignment problem, this is a place to have uniquely useful interactions.

-Ihor Kendiukhov (Mentor)

Our concepts of mind, cognition, life, values, teleology etc are lacking. We have so far been unable to state the core problems in satisfactory terms, and normal science requires confidence in legible frameworks which capture the crucial nuances in order to make productive headway. The field is pre-paradigmatic, the size of minds implies that successfully framing the problem-statement will likely be harder than many historical philosophy-to-science transitions, and yet this is not for the most part taken as a pressing call to do more philosophy. Instead, progress is being made on a variety of related problems for which the language of other sciences can be borrowed. The things for which we have formalisms are getting reliably solved and honed, but the rate at which we acquire new formalisms is troublingly slow while few of them even aspire to capture the entire field. In particular, a large part of the current work does not concern itself with superintelligence alignment but rather with the control of weak-to-moderately-strong systems.

None of this is surprising. Status- and funding incentives select for legibility while most field-building programs require work to be done in a matter of months. The only way to predictably and quickly produce legible output in an unsettled field is to prematurely accept a frame and perform normal science. In particular we think that some newcomers have potential to do useful foundational work on alignment but that the incentive landscape described funnels them towards comparatively reachable but less crucial sub-problems instead. Given this landscape, AFFINE was to create a space where the broader, slower kind of thinking can happen: where researchers can look for sturdy foundations without getting captured by perverse incentives. Not just because the search for holistic philosophical groundwork is neglected, but because it is the most marginally useful skill to train people in at this moment.

Note that models are getting very good at solving well-specified problems in scientific domains equipped with a crisp verification tool. Coding, technical math, etc. In a more general sense, we can expect AI to continue getting differentially better at “in-frame” research, where good performance is less fuzzy and easier to reward. “Claude 4.8 is over some sort of tipping point for me, where I feel like I can ‘just keep going and keep making progress’ in some new sense” reports Demski, and personal observations say that Fable is a large improvement over Opus 4.8 when it comes to technical math research. Outsourcing superintelligence alignment to weaker systems is a default outcome as things are standing. Assuming that these can be made trustworthy, two issues present themselves, the first being that it’s very difficult to evaluate cognitive labour which you cannot yourself perform, and the second being the AI’s imbalanced skill at tackling crisp vs. frameless problems. Consequently we do not expect these systems to track the philosophical nuance of pre-paradigmatic research properly, causing useful progress to become bottlenecked on people who “actually get the problem”. The skill of finding and legibilizing the right questions will be decisive for research in the future.

Brains in a Chateau in Bohemia

“Most intellectual environments are quite result-oriented, and I think this works against tackling problems as difficult as superintelligence alignment. AFFINE gave me a great environment to think deeply about my fundamental models of the world, without asking for immediate output.”

-Haru Kim (participant)

There were thirty participants at our seminar, most with some significant degree of STEM background and varying familiarity with the alignment problem. In terms of raw wits and curiosity, we found them immediately impressive, but steering them to the place where we wanted to reach proved difficult. It turns out that you cannot just put people in an ostensibly educational environment and tell them that they will not be judged on output or the number of concepts they can get under their belts. They will not believe you, and they will try to optimize for whatever it is that you secretly want, because not judging on output is simply not how the world works. Alas, this is contrary to the point of AFFINE.

If there were to be a reasonable metric of our success, it would be the degree to which we empowered the participants to think freely and with little thought to output-oriented incentives. We wanted them to load up the whole problem and hold it in their attention, to “not just do something but stand there”. We can’t measure that, though we believe that we ultimately got there. What we can point to is the sheer number of whiteboards that were filled and re-filled in the common areas instead of writing (or reading) papers. The trick, in the end, was simple: Tell them the goal instead of trying to engineer some bespoke incentive gradient from scratch: “You are here for ambitious, holistic theory work. Not for anything empirical, not for anything you can make significant headway on in a month. All we want you to do is to look in the right type of direction.”

Egregore

“AFFINE was one of the most intellectually generative environments I’ve ever been in [...]. It’s an incredibly well optimized environment for coming up with new interesting ideas, and an amazing set of people to spend a month with

-Samuel Ratnam (participant)

The classic mentor-mentee dynamic doesn’t scale well. There simply are not that many truly good mentors to go around, but there’s an even bigger problem: It fundamentally does not encourage the sort of heroic agency that we want from future visionaries. AFFINE, therefore, settled on a different model. We still had mentors, some permanent and some temporary, giving talks, workshops, and personal guidance, but we also made an app that collected relevant resources and allowed participants to publicly mark themselves as able and willing to explain a given topic. Teaching is a forcing function for frame-refactoring, and we believe it is one of the most critical activities for what we are trying to achieve. Most expertise was acquired through peer interactions, as we allowed the participants to train each other and explore much more freely than they might under the watch of a single mentor serving as a guide. We built a hivemind designed to learn, disseminate, and boggle all by itself, where everyone has an incentive to push the collective understanding forward by sharing the right tools and asking the right questions. If we learned one thing, it’s that we should leave this process even more room to unfold itself: less scheduled activities and more free space for standing around a whiteboard transmitting. These whiteboards, we believe, is where the magic happens.

whiteboards examining the origin of life

Bridge-Building

AFFINE was an amazing experience. I don’t think I’ve ever been in a place with such a high concentration of people with interesting takes on Alignment before. And this is coming from someone who spent a month at Lighthaven and works out of LISA!

-Sean Herrington (participant)

A number of our fellows did not want to be researchers. They were or wanted to be doing governance or public communication work so as to give researchers time to do anything useful before the apocalyptic deadline. Of course we support this, but we considered it to be a bit of a bug with regards to our program at the start. AFFINE was a theory seminar after all. Since then, however, we’ve come to the conclusion that it was a benefit to have these non-technical participants. Not because AFFINE shouldn’t be about alignment theory, but because a theory seminar is a great place for governance people to be. A place like AFFINE connects them with scholars, builds robust models of existential risk which are rooted in a deep understanding of the pre-paradigmatic nature of the field and thus not fragile to some combination of minor technical breakthroughs. We have heard back from this group that attending AFFINE was much more useful to them than governance-centric events they have attended before or since, and we want to make a contingent of participants like them into a deliberate aspect of our future endeavours.

Numbers

“The AFFINE seminar was one of the greatest months of my life. It’s incredible how much you can grow when you are surrounded by smart people who share your mission.”

-Elias Schlie (participant)

The numbers don’t matter. You can’t really feel them, and you reward-hack yourself if you stare at them for long enough, which is a shame because our numbers are really good, actually. On a scale of one to ten, participants would recommend AFFINE with a strength of 9.1 on average. They are still in contact with 7.3 people from the seminar one month down the line and would consider reaching out to more than ten of them for small favors like reviewing a post. They were able to follow through on most of the plans and commitments they made during the seminar, whose impact on their model of the alignment problem they rated as 8.6/​10. They rated its long-term impact on their productivity as 810 and on their mental state as 8.1/​10.

Future Plans

We got a lot of useful training data during this seminar, which is the polite way of saying that we messed up a bunch. Any future AFFINE will have a much clearer mission statement from the get-go, a curriculum which more deliberately starts out by introducing the hard problems and obvious model-insufficiencies. Leading into an open ended structure with scaffolding for transmission/​distillation and mentor access, interwoven with workshops on process-skills such as builder-breaker. We want to make the app more expansive, improve the matchmaking for transmitting/​receiving as it ties to personal models rather than articles, as well as make the fixed content more organized. We want point-of-contact mentors for individual fellows for open-ended guidance. Most importantly we want to reduce scheduling and give the participants more time for what we consider to be the single most valuable activity: standing around a whiteboard and thinking.

We definitely want to keep doing AFFINE. We are planning another seminar as early as the end of this year, a fellowship to support longer-term deep thinking, and possibly a research retreat.

If you are (or were) a researcher with models for the object level work, or if you have relevant experience with regards to e.g. operations or event-organising and are interested in what we are doing, please reach out by e-mail to ouro@affi.ne or through lesswrong DMs. It is very possible that we will want to involve you.

Thank you

group picture

To everyone who made AFFINE I possible:

Ops: Adriana Arauzo, Phil Chen, František Drahota, Turner Halle, Pauliina Laine, Emily Medén, Jiri Nadvornik, Grace Roberts, Andrew Szabados

Mentors: Mateusz Bagiński, Lucius Bushnaq, Abram Demski, Gurkenglas, Jonas Hallgren, Kaarel Hänni, Felix Harder, Jobst Heitzig, Steven Kaas, Johannes C Mayer, Richard Ngo, Ihor Kendiukhov, Vanessa Kosoy, Vojta Kovařík, Jan Kulveit, Linda Linsefors, Ouro, Julia Persson, Justin Shovelain

Wellbeing: Tilman Masur, Sofie Meyer, Kitt Morjanova, Ryan Thomas

Misc: Camille Berger, Joe Collman, Katalina Hernández, Peter Hozák, Eduard Kapelko, Roman Malov, niplav, Elisa Paka, Plex, Attila Ujvari & the Hostačov staff

EquiStamp for sponsoring the app development

  1. ^

    This is a confabulation of events and not any one participant’s experience of any one day

No comments.