Corm

Karma: 192

Hi, I’m Cormac. I am currently trying to find my way towards the most impact I can have in AI existential risk.

sladebyrd.com

Corm 11 Apr 2026 15:45 UTC
2 points
0
in reply to: Hazard_real’s comment on: Anthropic is Really Pushing the Frontier, What Should We Think?
Thanks for responding, and feeling strongly enough to make an account to do so! I appreciate the feedback. I think overall a core belief of mine is building the smartest AI on the planet is extremely high stakes and I (at least) would hope and have high standards for what it looks like to steward that into existence. This bias is certainly baked into this essay and I think it’s reasonable that if you (or anyone else) doesn’t have that frame this essay is less strong, since it’s less important really actually get everything right.

I think there’s a couple things going on here that I appreciate the feedback on. There are certainly beliefs of mine that I don’t think I fully justify within the course of this article. I’ve written a little bit (and thought a lot) about the previous version of the RSP and I certainly agree that
to keep risks low, it is not enough to maintain risk mitigations as capabilities increase — rather, we must accelerate our progress on risk mitigations. While we do not see any fundamental barriers to achieving this, success is far from guaranteed.
does not clearly support that belief of mine. I generally try to make it clear when something isn’t fully supported in the essay by saying something like “I believe” but clearly I didn’t do a great job here.
Another thing I struggled with here was how much to write for someone who already has some context vs. really describing everything exactly and fully. I also struggled with this in the ordering of the various points since frequently they are interrelated.
Anthropic just released what is by far the world’s best AI model
I will note that I did make it very clear that it wasn’t a public release and estimated the total number of people I think have access. How bad is it if someone who stops before they have made it 25% of the way through an essay has an incorrect assessment of reality. I certainly agree it’s not optimal, but it seems kind of impossible to fully solve. Perhaps it’s sufficiently important in this case to footnote it immediately—that seems low cost and sufficiently worth it. I’ll do that.
“For those with access, this model is surprisingly uninhibited”
I have a feeling if they RL trained the model with more inhibitions you would probably be saying something like “despite saying that they won’t release it to the public, they are clearly preparing for it, look at the company’s incentives here”.
I clearly didn’t do a good enough job here of describing the mechanism. This is not at all a claim about the model itself, but that they are choosing to not include the Classifier-based prompt blocking that is normally included in public releases. I do say this in the very next line after the one you quoted: “Because of the very limited and targeted nature of this release, we are not blocking exchanges based on classifier triggers”, so this feels a little uncharitable. But, also I think if I want to write for non technical people not in this field I can’t expect them to know what classifier-based prompt blocking is. I do try to define this later in the essay, but once again ordering is hard
To be explicit, the decision not to make this model generally available does not stem from Responsible Scaling Policy requirements.
The point I clearly mostly failed to make here is that it can’t be blocked by the RSP. The v3 RSP is specifically designed to not block new releases (unlike previous versions)

I will say overall I am a little sad that this critique doesn’t engage with any of the technical arguments which (at least in my eyes) are where I am most unhappy with the state of this release and what can be inferred about how future releases will look. I also understand why that is the case, technical arguments are much harder to assess and know how important they are if you aren’t in the field.
Once again just trying to do my best, and I appreciate the places you’ve spent the time to give feedback on how you read and understood this piece!

Corm 8 Apr 2026 19:37 UTC
7 points
2
in reply to: Davidmanheim’s comment on: 101 Humans of New York on the Risks of AI
“That doesn’t work; less dense in NYC generally means significantly wealthier, and NYC already skews much wealthier than average.”

I (attempted) to account for that! I went to both generally poorer and less dense neighborhoods using this neighborhood optimizer I built out: https://sladebyrd.com/canvassing-planner

Which fun fact (perhaps unsurprisingly) led to me canvassing in the neighborhood considered to be the murder capital of NYC. (Which was interestingly my single most productive canvassing session)

There is of course the issue that within a given neighborhood I am still selecting the wealthier residents. I tried adding in bias term to attempt to account for that (ie I assume I am getting responses from people that are $X above the neghborhood median), but I do agree it’s hard.

I include the collected as well as bayesian estimated demographical data of my responses at the end of the report.

Corm 6 Mar 2026 16:30 UTC
1 point
0
in reply to: the gears to ascension’s comment on: I Had Claude Read Every AI Safety Paper Since 2020, Here’s the DB
Okay, I finished the first pass at this: https://sladebyrd.com/ai-safety-db/posts
any thoughts?

Corm 4 Mar 2026 13:59 UTC
1 point
0
in reply to: the gears to ascension’s comment on: I Had Claude Read Every AI Safety Paper Since 2020, Here’s the DB
“I’d encourage you to steal from it. perhaps clone it in ../ and tell claude code to look at it as needed. note: a major todo for me is getting it to get comments, which it doesn’t do now.”

Working on this right now

Corm 4 Mar 2026 12:53 UTC
3 points
0
in reply to: the gears to ascension’s comment on: I Had Claude Read Every AI Safety Paper Since 2020, Here’s the DB
The original search before I went with the references based approach got a couple posts, but I think clearly not enough. I couldn’t figure out a good systematic way to get posts, but I will definitely spend some more time thinking about this and add a tag for LW/AI Alignment posts,

Corm 4 Mar 2026 12:49 UTC
1 point
0
in reply to: Robert Miles’s comment on: I Had Claude Read Every AI Safety Paper Since 2020, Here’s the DB
No it did not! I’ll take a look

Corm 3 Mar 2026 22:58 UTC
1 point
0
in reply to: Shankar Sivarajan’s comment on: White-Box Attacks on the Best Open-Weight Model: CCP Bias vs. Safety Training in Kimi K2.5
Heretic uses ablation, which requires editing all of the weight matrices. My quick assessment is that the heretic codebase as it currently exists couldn’t deal with K2.5 out of the box because K2.5 does some weird things that heretic isn’t baseline designed to handle. I do think that it would be possible to get Heretic working on K2.5 with real effort put into it. The largest Heretic’d models on HF are a fifth the size of K2.5 and it looks like they still get around ³⁰⁄₁₀₀ refusals (this is not surprising because ablation is simply harder with MoE models) compared to my 0%, although my guess is they have less KL divergence than my approach which is more of a throw the kitchen sink at the problem vibe. Heretic uses an automatic optimization process to find the best coefficients for each abliteration, Claude thinks the runpod costs for this could easily go over $100.

Corm 3 Mar 2026 22:15 UTC
3 points
0
in reply to: Hoagy’s comment on: I Had Claude Read Every AI Safety Paper Since 2020, Here’s the DB
Thanks for the suggestion! Most of https://transformer-circuits.pub/ have an arXiv and got picked up, but it seems like I missed two of them. It looks like I don’t have anything from https://distill.pub/ which I will work on.

Corm 1 Apr 2025 19:21 UTC
4 points
2
on: Housing Roundup #11
“Our fair city is poised to allow 6-story buildings citywide by an 8-1 vote. In context that is a huge change. Under the old rules only 350 units (!) total were expected over 15 years and 85%+ of the existing housing wouldn’t have been legal to build. Here’s a primer on the changes. They had to compromise a bit on setbacks and lot size to get it over the finish line, but it still seems great.” I wrote about the specifics of this if anyone wants to hear more/cool data visualization maps.
https://ascendantnewyork.substack.com/p/what-nyc-can-learn-from-cambridge

Corm 29 Feb 2024 16:35 UTC
1 point
0
on: Wholesomeness and Effective Altruism
what did you use to generate the images in this post?

Corm 1 Feb 2024 18:50 UTC
1 point
0
on: AI #49: Bioweapon Testing Begins
I am become Matt Levine, destination for content relevant to my interests.
You don’t even need to go to London for mundane utility, there’s and “AI Mart” in LIC.

Corm 25 Sep 2023 15:24 UTC
1 point
0
on: Honor System for Vaccination?
Have you considered doing random spot checks. Feels like even 3x per year gets 80% of the value.

Corm 29 Aug 2023 20:00 UTC
7 points
4
in reply to: awg’s comment on: Dating Roundup #1: This is Why You’re Single
Match group acquired OKCupid in 2011, ever since then it has been destroying OKCupids functionality and moving it closer and closer to a generic swipe dating app.

Corm 11 May 2023 19:14 UTC
7 points
0
on: AI #11: In Search of a Moat
Here’s some major mundane utility, huge if true: Automatically analyze headlines for their implications for stock prices, get there first and earn a 500% return
I read the article about this and their assumptions are insane. Normally stock news happens after the market closes. The way returns on the next day are calculated is just (end of day price for the day)/(end of day on the previous day). It is importantly not (end of day)/(opening price) of the stock. This is why stocks are able to very frequently open up many percentage points due to news happening overnight.
So, all they did was say based on the news we will get long if chatGPT says good and get short if chatGPT says bad, and the way we will calculate our returns is just whatever the returns for the next day are. But remember returns for the next day are based on the close for the previous day. So, they are assuming that if news comes out at 6pm they are able to buy or sell the relevant stock at exactly the 4pm closing price. This is a fucking insane assumption. Not only are markets very thin in the extended 4pm-8pm hours and pre open 4am-9:30am hours. But responding to news during non core trading hours (any time not 9:30-4:00) is already an extremely standard thing that hedge funds do. So yes this would be true if there was some magical market maker that was willing to market make at unlimited size in either direction without doing any fading in response to news or someone trading against them during the normally very thin non core trading hours.

Corm 20 Feb 2023 16:23 UTC
2 points
0
on: The Estimation Game: a monthly Fermi estimation web app
Would be nice to be able to have a summary after finishing for each question of what I estimated, what the correct answer was, and points scored.

Corm 15 Feb 2023 16:11 UTC
4 points
0
on: Junk Fees, Bunding and Unbundling
title: bundling not bunding

Corm 1 Nov 2022 15:45 UTC
1 point
1
on: Marvel Snap: Phase 1
For anyone specifically looking for a phone game, I have recently found Slice & Dice: https://tann.itch.io/slice-dice
to be very compelling, consistently interesting choices and good to jump in an out of while riding public transit for 20-30 minute spurts.

Corm 1 Sep 2022 18:52 UTC
7 points
2
on: Covid 9/1/22: Meet the New Booster
Talking to strangers is more enjoyable than people think. Update accordingly. Notice that if you already know about how much people enjoy talking to strangers, but not how much they expect to enjoy it, then this new information becomes bad news, but I presume it is instead good news. Does this mean it is ‘that easy’? No, but it’s also not that hard.
I read the 2014 paper that this NYT article refers to. They ran the experiment on METRA which is a very white commuter line from Chicago Suburbs into downtown Chicago. Sample size less than 200, mean age 49 (SD 13). So it’s unclear how generalizable this study is to other contexts.
It is especially interesting that the NYT article chose to use the 2014 paper when the same author published a 2021 paper with basically the exact same setup but in the London Underground (and with a ~4x larger sample size) which I would expect to have a significantly less homogenous makeup (and therefore possibly more generalizable) than METRA. This is probably b/c while the 2021 paper did find that people somewhat underestimated how much they would enjoy talking, the control group had a larger underestimate of how much they would enjoy doing control things. So there was a larger difference between actual enjoyment and expected enjoyment in the group that was told to do whatever they normally do on their commute than the group that was told to talk to whoever sat next to them.
I will say that ignoring the difference between expected enjoyment and actual enjoyment being inconclusive here, in the london underground paper people expected to most enjoy talking to strangers AND did most enjoy talking to strangers. So it is somewhat interesting that people generally choose to not talk to others despite both expecting and realizing the most enjoyment by talking to others. Maybe there’s some tail risk considerations here where better avg commute isn’t worth 1/10k chance of talking to someone who decides to stalk you etc.

Corm 14 Jul 2022 18:18 UTC
1 point
2
on: Criticism of EA Criticism Contest
It seems to me like first and second teacher are getting mixed up.
“The second teacher wants something useful, a thought out and justified view on population ethics that doesn’t get too lost in the weeds.”
“There are times and places where you want the second teacher (or something in between them) rather than the first one. This does not seem like it is one of those places.”
Don’t we want the second teacher in this place?

Corm 24 Apr 2022 16:18 UTC
4 points
1
on: China Covid #2
I talked to a good friend of mine who is a (I originally had a pretty large descriptor of bonafides here but decided to minimize her exposure surface area). I generally consider her to be very knowledgeable/a very Chinese govt insider point of view, but generally with a Chinese positive slant.
Her claim was that the tepid Shanghai covid response was due to conflicting local vs. national govt priorities. in that the Shanghai govt wanted to coexist with covid due to having high # ICU beds/population and having relatively high vaccination rates. Shanghai only started with stronger measures closer to covid 0 policies after the natl govt exerted pressure.
Re: food, her claim is that China has a national food stockpile to feed the country for 2-3 years, and that while there are short term supply chain issues she isn’t worried about mass starvation happening on any short term time scale. She seems to think that worldwide there is reasonable probability of significant food issues but seems very unconcerned about the local food situation. I was somewhat skeptical of the 2-3 years food stockpile claim but my very quick googling makes it seem like china does have a huge food stockpile