Insider info from the Inkhaven Writer’s Residency @ Lighthaven: we’re being given swanky enamel Prestige Pins with the Inkhaven logo on them.
But not just anyone gets a Prestige Pin.
The pins were created to encourage us to spread our creative wings and try different things. In order to earn a pin, you must have published an Inkhaven post that falls into EACH of seven categories: Fiction, Emperical, Informational, Persuasive, Humour, Advice, and Personal.
The program was announced a couple of days ago. I had already written in 6⁄7 posts and I was planning to write a humour-ish post anyways, so last night I requested a pin and this morning the team signed off on it. I am now the owner of an Inkhaven Prestige Pin.
Woo!
This makes me think, though. Was this an experiment, and have I been had?
I think part of the ethos of the Inkhaven Residency¹ is to cultivate and encourage agency and self-motivation through a high pressure environment. Having published 32,958 words at Inkhaven at the time of writing, I’m pretty astonished with how much I’ve output here. I never would have done this if I wasn’t shoved into a weird compound in Berkeley with 54 53 other weirdos.
I am quite pleased with this. I feel like I had more of a “you can just do things” drive a few years ago when I was in university, and I really think this month might be getting me out of that creative rut.
But if the purpose of Inkhaven is (in part) to curate our internal locus of agency, isn’t it odd that we were suddenly given a very clearly external motivator— a enamel flame for us moths to fly to? It feels a bit odd to me.
After all, many of us came to Inkhaven with very clear writing objectives: some people here write exclusively about AI safety, AI policy, or AI technical stuff. Some of us are travel and lifestyle bloggers, and some of us are etymologists. Curation of one’s voice is another motif of the experience, and by forcing ourselves to write fiction or sature when that’s very much not our niche… is that productive?
Maybe. Certainly many talents and niches aren’t fully realised until they are thrust upon us. The enamel pin is a target for us archers to hit, but the real learning happens as we take aim.
But at the same time it’s a little funny that this high stakes program of self actualisation introduced the Prestige Pin two thirds of the way through. What an inversion of our personal creative expression to hand us a bullet point list and give us a “while supplies last!” marketing pitch.
It certainly worked on me.²
On Saturday the organisers are holding an open exhibition for the Residency— the Inkhaven Fair— here at Lighthaven. Come check it out. You might get to watch all the Prestige Pin awardees be publicly humiliated for being sheep rather than Real Bloggers.
[1] - I have spent weeks wondering what exactly the ethos of this program is— a program on which Lightcone Infrastructure reportedly loses tens of thousands of dollars on each time they run it. I have thoughts on this but I won’t fully write it up until May.
[2] - I’d again like it on record that I had already met most of the requirements for the pin. But maybe my only “humour” post wouldn’t have been written in quite the same style if there wasn’t a Prestige Pin on the line...
If Lesswrong had certain prestige pins, like Pokemon badges, I would write on LW more often. The idea of a trinket that expresses social status to a niche group of people in-the-know, is so sticky to my brain.
If LW did this, I think they could sell the badges for $30-50 each, and you only ‘unlock’ those purchases in the store when you—for example—have 10 posts hit the front page, or have made 100 useful wiki edits, or got to 1000 Karma.
Oh, and then they could do special badges for Petrov Day participants! And April fools! And my god, a Shoggoth Enamel pin would be one of my most prized possessions (Only available to purchase during Fooming Shoggoth concerts).
I do like where your head’s at, though. As a new LessWrong user who loves nothing more than in-group signalling, I’m a little sad at the total lack of LW/Lightcone merch available.
I have a Redbubble store, where I sell some Rationalist merch. All the Rationalist stuff (expect for a few items I was too lazy to change the setting on) are sold at cost, and I make no money from them. However, they’re honestly not very good, and I mainly put the rationalist stuff up there, because I wanted to a “notice confusion” phone case.
I think it’s ~silly that Lightcone doesn’t have a merch store, because I’d buy the shit out of Lesswrong merch. Like, I also imagine Shoggoth Blind boxes, like PopMart. Which I would easily spend $200 on to buy all an entire carton of (if it’s guaranteed that the carton contains all the variants)
I think “Lightcone Brand Paperclips” is also another cute, untapped market.
I just attended a practice run of D. Scott Phoenix’s “The Endgame” milsim/wargame/LARP event in Berkeley, CA. It was interesting. If you take our simulation tonight as gospel, xAI is going to win the AI compute/fab race IFF China blockades Taiwan.
~40 people split into teams representing different actors in the current AI space (Anthropic, OpenAI, xAI, but also Venture Capital, the US, China, and The Public, etc.) and engaged with each other over rounds, brokering deals with each other and taking actions to see what happens in a simulation of the near future.
China and the US went to a hybrid war, the US nationalised AI labs and American silicon fabs, Anthropic and Deepmind merged, xAI become the foremost silicon provider in the country, and some other stuff happened.
Oh yeah, and Claude GigaMythos leaked, political travel plans went public, and a few US politicians got assassinated. But in the end, the US and Chinese governments and their over-reliance on AI for governance and policy led to an AI-enforced world peace!
It was a cool experience, if a little flawed. So much potential. Unfortunately it seems like Scott is going to present this game this week at a conference and then never run it again.
I think if you gave me 14 days and a few playtests I could design a far better version of this game that flowed better, was more engaging, and achieved the goal better (helping powerful people in AI model what the future might look like based on incentives). I understand the need to keep the game tight and lightweight, but the game really needs better modelling for mass media, social media, relationships, resources, and “what can I do on a given turn?”. There are also some changes I would make to the structure and the factions available.
(I’m not critiquing out of disrespect. It was just my hope that Scott might hire me to work on and improve his game. Alas, it looks like The Endgame concludes in the next few days.)
I might write a LW effortpost about this actually. I love game design and I would one day like someone to pay me to design games/run events for them. I am particularly inspired by other simulation games like the UK based Megagame Makers who have been doing this kind of thing for years.
I am a layman with no education or training on AI or AI Safety. I have been following the Claude Mythos/Glasswing arc as well as the general explosive improvement in AI coding ability. I believe LLMs are still exploding with competence, particularly in coding.
I think the reasons for coding being the #1 area of LLM performance improvement are:
1) There’s a clearer “correct answer” for training a bot to write a quicksort implementation than, say, a short story
2) There’s more enterprise money in training coding bots than, say, short story bots
All that said, I find it hard to believe that AGI, if and when it comes, will be LLM-shaped?
As a user of LLMs for 3.5 years (and GPT sandbox before ChatGPT came out), I feel as though there are certain areas— writing and rhetoric in particular— where the models are approaching some sort of ceiling. I’m not impressed with Claude 4.7 in that capacity, nor was I with 4.6. They feel about as good as 4.5. I can say the same of ChatGPT, Grok, and to some extent Gemini.
But this is just vibes! And maybe a small amount of motivated reasoning as a self-proclaimed blogger-fictionist. We don’t really have a good way to index writing ability, especially not fiction. All we have is the opinions of people with taste, and the taste-makers still seem to say AI-written fiction is crap. I tend to agree.
What I’m trying to get at is that when people talk about Mythos as AGI, I’m like, “yeah, it’s a super smart coder and it’s going to change the world of software and cybersecurity forever, but also it’s just an LLM.” Maybe the AGI of the future will have an LLM component, but I can’t help but cringe when people say whatever new LLM could be AGI.
I don’t know. Again, I am a layman. But of all the people who follow AI and LLMs, I’m probably in the bottom quartile of the ranking of people who think about LLMs from a software dev standpoint.
My take about the writing quality stagnating while hard verifiable metrics go up is the delineation between reinforcement learning and increasing the size of the base model. Unfortunately the specific details of model training are private so it makes claims like this feel a bit hollow, but it seems likely that eg Opus 4.6 is “the same model” as Opus 4.5 just with many more steps of RL applied to make it better at specific discrete verifiable tasks. This way of thinking predicts that notable scaleups in the size of the base model, which is presumably what Mythos is, would have notably better skills in the soft/unmeasurable areas such as writing, contextual understanding, humor etc. This is supported by the anecdotal reports at the end of the Mythos model card, though it’s hard to truly say without having the model ofc. It’s certainly POSSIBLE that LLMs will never be better at writing fiction and related soft fields than they are today, but I doubt this and think the wall you see is a result of the above RL trend rather than a true ceiling in capabilities being reached anytime soon.
I guess my main gripe with people who have your attitude re “LLMs can never be AGI” is like, what concretely is missing to you? What are the differences between Mythos and what to you would unequivocally be AGI, and why do you believe you need some mysterious secondary component in order to acquire that capability? To me the most obvious answer is memory and the ability to actively learn (“continual learning”), but one could even argue that is unneeded since the main character in Memento is certainly a general intelligence is he not?
It’s interesting to me just how many people in the AI research/safety crowd still think AI is worse at writing than the best humans. I think it’s true, but it’s an interesting contrast to the AI bros on Twitter who are like “I just made this in 10 minutes with [TOOL]. [INDUSTRY] is dead.”
Insider info from the Inkhaven Writer’s Residency @ Lighthaven: we’re being given swanky enamel Prestige Pins with the Inkhaven logo on them.
But not just anyone gets a Prestige Pin.
The pins were created to encourage us to spread our creative wings and try different things. In order to earn a pin, you must have published an Inkhaven post that falls into EACH of seven categories: Fiction, Emperical, Informational, Persuasive, Humour, Advice, and Personal.
The program was announced a couple of days ago. I had already written in 6⁄7 posts and I was planning to write a humour-ish post anyways, so last night I requested a pin and this morning the team signed off on it. I am now the owner of an Inkhaven Prestige Pin.
Woo!
This makes me think, though. Was this an experiment, and have I been had?
I think part of the ethos of the Inkhaven Residency¹ is to cultivate and encourage agency and self-motivation through a high pressure environment. Having published 32,958 words at Inkhaven at the time of writing, I’m pretty astonished with how much I’ve output here. I never would have done this if I wasn’t shoved into a weird compound in Berkeley with
5453 other weirdos.I am quite pleased with this. I feel like I had more of a “you can just do things” drive a few years ago when I was in university, and I really think this month might be getting me out of that creative rut.
But if the purpose of Inkhaven is (in part) to curate our internal locus of agency, isn’t it odd that we were suddenly given a very clearly external motivator— a enamel flame for us moths to fly to? It feels a bit odd to me.
After all, many of us came to Inkhaven with very clear writing objectives: some people here write exclusively about AI safety, AI policy, or AI technical stuff. Some of us are travel and lifestyle bloggers, and some of us are etymologists. Curation of one’s voice is another motif of the experience, and by forcing ourselves to write fiction or sature when that’s very much not our niche… is that productive?
Maybe. Certainly many talents and niches aren’t fully realised until they are thrust upon us. The enamel pin is a target for us archers to hit, but the real learning happens as we take aim.
But at the same time it’s a little funny that this high stakes program of self actualisation introduced the Prestige Pin two thirds of the way through. What an inversion of our personal creative expression to hand us a bullet point list and give us a “while supplies last!” marketing pitch.
It certainly worked on me.²
On Saturday the organisers are holding an open exhibition for the Residency— the Inkhaven Fair— here at Lighthaven. Come check it out. You might get to watch all the Prestige Pin awardees be publicly humiliated for being sheep rather than Real Bloggers.
[1] - I have spent weeks wondering what exactly the ethos of this program is— a program on which Lightcone Infrastructure reportedly loses tens of thousands of dollars on each time they run it. I have thoughts on this but I won’t fully write it up until May.
[2] - I’d again like it on record that I had already met most of the requirements for the pin. But maybe my only “humour” post wouldn’t have been written in quite the same style if there wasn’t a Prestige Pin on the line...
If Lesswrong had certain prestige pins, like Pokemon badges, I would write on LW more often. The idea of a trinket that expresses social status to a niche group of people in-the-know, is so sticky to my brain.
If LW did this, I think they could sell the badges for $30-50 each, and you only ‘unlock’ those purchases in the store when you—for example—have 10 posts hit the front page, or have made 100 useful wiki edits, or got to 1000 Karma.
Oh, and then they could do special badges for Petrov Day participants! And April fools! And my god, a Shoggoth Enamel pin would be one of my most prized possessions (Only available to purchase during Fooming Shoggoth concerts).
I’m going to go research enamel pin creation now.
P.S. Can we see a photo of your pin?!
Be in awe of my Prestige.
I do like where your head’s at, though. As a new LessWrong user who loves nothing more than in-group signalling, I’m a little sad at the total lack of LW/Lightcone merch available.
I bow at your majesty. It is a lovely pin, sir.
I have a Redbubble store, where I sell some Rationalist merch. All the Rationalist stuff (expect for a few items I was too lazy to change the setting on) are sold at cost, and I make no money from them. However, they’re honestly not very good, and I mainly put the rationalist stuff up there, because I wanted to a “notice confusion” phone case.
I think it’s ~silly that Lightcone doesn’t have a merch store, because I’d buy the shit out of Lesswrong merch. Like, I also imagine Shoggoth Blind boxes, like PopMart. Which I would easily spend $200 on to buy all an entire carton of (if it’s guaranteed that the carton contains all the variants)
I think “Lightcone Brand Paperclips” is also another cute, untapped market.
Shoggoth Pin Mockup: And according to Custom Ink it’s only $909 USD to get 150 of these shipped to me in Australia! If I was rich, I would do this.
I just attended a practice run of D. Scott Phoenix’s “The Endgame” milsim/wargame/LARP event in Berkeley, CA. It was interesting. If you take our simulation tonight as gospel, xAI is going to win the AI compute/fab race IFF China blockades Taiwan.
~40 people split into teams representing different actors in the current AI space (Anthropic, OpenAI, xAI, but also Venture Capital, the US, China, and The Public, etc.) and engaged with each other over rounds, brokering deals with each other and taking actions to see what happens in a simulation of the near future.
China and the US went to a hybrid war, the US nationalised AI labs and American silicon fabs, Anthropic and Deepmind merged, xAI become the foremost silicon provider in the country, and some other stuff happened.
Oh yeah, and Claude GigaMythos leaked, political travel plans went public, and a few US politicians got assassinated. But in the end, the US and Chinese governments and their over-reliance on AI for governance and policy led to an AI-enforced world peace!
It was a cool experience, if a little flawed. So much potential. Unfortunately it seems like Scott is going to present this game this week at a conference and then never run it again.
I think if you gave me 14 days and a few playtests I could design a far better version of this game that flowed better, was more engaging, and achieved the goal better (helping powerful people in AI model what the future might look like based on incentives). I understand the need to keep the game tight and lightweight, but the game really needs better modelling for mass media, social media, relationships, resources, and “what can I do on a given turn?”. There are also some changes I would make to the structure and the factions available.
(I’m not critiquing out of disrespect. It was just my hope that Scott might hire me to work on and improve his game. Alas, it looks like The Endgame concludes in the next few days.)
I might write a LW effortpost about this actually. I love game design and I would one day like someone to pay me to design games/run events for them. I am particularly inspired by other simulation games like the UK based Megagame Makers who have been doing this kind of thing for years.
Thank you Scott for the invite! :D
I am a layman with no education or training on AI or AI Safety. I have been following the Claude Mythos/Glasswing arc as well as the general explosive improvement in AI coding ability. I believe LLMs are still exploding with competence, particularly in coding.
I think the reasons for coding being the #1 area of LLM performance improvement are:
1) There’s a clearer “correct answer” for training a bot to write a quicksort implementation than, say, a short story
2) There’s more enterprise money in training coding bots than, say, short story bots
All that said, I find it hard to believe that AGI, if and when it comes, will be LLM-shaped?
As a user of LLMs for 3.5 years (and GPT sandbox before ChatGPT came out), I feel as though there are certain areas— writing and rhetoric in particular— where the models are approaching some sort of ceiling. I’m not impressed with Claude 4.7 in that capacity, nor was I with 4.6. They feel about as good as 4.5. I can say the same of ChatGPT, Grok, and to some extent Gemini.
But this is just vibes! And maybe a small amount of motivated reasoning as a self-proclaimed blogger-fictionist. We don’t really have a good way to index writing ability, especially not fiction. All we have is the opinions of people with taste, and the taste-makers still seem to say AI-written fiction is crap. I tend to agree.
What I’m trying to get at is that when people talk about Mythos as AGI, I’m like, “yeah, it’s a super smart coder and it’s going to change the world of software and cybersecurity forever, but also it’s just an LLM.” Maybe the AGI of the future will have an LLM component, but I can’t help but cringe when people say whatever new LLM could be AGI.
I don’t know. Again, I am a layman. But of all the people who follow AI and LLMs, I’m probably in the bottom quartile of the ranking of people who think about LLMs from a software dev standpoint.
My take about the writing quality stagnating while hard verifiable metrics go up is the delineation between reinforcement learning and increasing the size of the base model. Unfortunately the specific details of model training are private so it makes claims like this feel a bit hollow, but it seems likely that eg Opus 4.6 is “the same model” as Opus 4.5 just with many more steps of RL applied to make it better at specific discrete verifiable tasks. This way of thinking predicts that notable scaleups in the size of the base model, which is presumably what Mythos is, would have notably better skills in the soft/unmeasurable areas such as writing, contextual understanding, humor etc. This is supported by the anecdotal reports at the end of the Mythos model card, though it’s hard to truly say without having the model ofc. It’s certainly POSSIBLE that LLMs will never be better at writing fiction and related soft fields than they are today, but I doubt this and think the wall you see is a result of the above RL trend rather than a true ceiling in capabilities being reached anytime soon.
I guess my main gripe with people who have your attitude re “LLMs can never be AGI” is like, what concretely is missing to you? What are the differences between Mythos and what to you would unequivocally be AGI, and why do you believe you need some mysterious secondary component in order to acquire that capability? To me the most obvious answer is memory and the ability to actively learn (“continual learning”), but one could even argue that is unneeded since the main character in Memento is certainly a general intelligence is he not?
It’s interesting to me just how many people in the AI research/safety crowd still think AI is worse at writing than the best humans. I think it’s true, but it’s an interesting contrast to the AI bros on Twitter who are like “I just made this in 10 minutes with [TOOL]. [INDUSTRY] is dead.”