Hi, I’m Cormac. I am currently trying to find my way towards the most impact I can have in AI existential risk.
Corm
I Had Claude Read Every AI Safety Paper Since 2020, Here’s the DB
101 Humans of New York on the Risks of AI
Side by Side Comparison of RSP Versions
White-Box Attacks on the Best Open-Weight Model: CCP Bias vs. Safety Training in Kimi K2.5
Anthropic is Really Pushing the Frontier, What Should We Think?
What Are My Values?
Match group acquired OKCupid in 2011, ever since then it has been destroying OKCupids functionality and moving it closer and closer to a generic swipe dating app.
Here’s some major mundane utility, huge if true: Automatically analyze headlines for their implications for stock prices, get there first and earn a 500% return
I read the article about this and their assumptions are insane. Normally stock news happens after the market closes. The way returns on the next day are calculated is just (end of day price for the day)/(end of day on the previous day). It is importantly not (end of day)/(opening price) of the stock. This is why stocks are able to very frequently open up many percentage points due to news happening overnight.
So, all they did was say based on the news we will get long if chatGPT says good and get short if chatGPT says bad, and the way we will calculate our returns is just whatever the returns for the next day are. But remember returns for the next day are based on the close for the previous day. So, they are assuming that if news comes out at 6pm they are able to buy or sell the relevant stock at exactly the 4pm closing price. This is a fucking insane assumption. Not only are markets very thin in the extended 4pm-8pm hours and pre open 4am-9:30am hours. But responding to news during non core trading hours (any time not 9:30-4:00) is already an extremely standard thing that hedge funds do. So yes this would be true if there was some magical market maker that was willing to market make at unlimited size in either direction without doing any fading in response to news or someone trading against them during the normally very thin non core trading hours.
Talking to strangers is more enjoyable than people think. Update accordingly. Notice that if you already know about how much people enjoy talking to strangers, but not how much they expect to enjoy it, then this new information becomes bad news, but I presume it is instead good news. Does this mean it is ‘that easy’? No, but it’s also not that hard.
I read the 2014 paper that this NYT article refers to. They ran the experiment on METRA which is a very white commuter line from Chicago Suburbs into downtown Chicago. Sample size less than 200, mean age 49 (SD 13). So it’s unclear how generalizable this study is to other contexts.
It is especially interesting that the NYT article chose to use the 2014 paper when the same author published a 2021 paper with basically the exact same setup but in the London Underground (and with a ~4x larger sample size) which I would expect to have a significantly less homogenous makeup (and therefore possibly more generalizable) than METRA. This is probably b/c while the 2021 paper did find that people somewhat underestimated how much they would enjoy talking, the control group had a larger underestimate of how much they would enjoy doing control things. So there was a larger difference between actual enjoyment and expected enjoyment in the group that was told to do whatever they normally do on their commute than the group that was told to talk to whoever sat next to them.
I will say that ignoring the difference between expected enjoyment and actual enjoyment being inconclusive here, in the london underground paper people expected to most enjoy talking to strangers AND did most enjoy talking to strangers. So it is somewhat interesting that people generally choose to not talk to others despite both expecting and realizing the most enjoyment by talking to others. Maybe there’s some tail risk considerations here where better avg commute isn’t worth 1/10k chance of talking to someone who decides to stalk you etc.
“That doesn’t work; less dense in NYC generally means significantly wealthier, and NYC already skews much wealthier than average.”
I (attempted) to account for that! I went to both generally poorer and less dense neighborhoods using this neighborhood optimizer I built out: https://sladebyrd.com/canvassing-planner
Which fun fact (perhaps unsurprisingly) led to me canvassing in the neighborhood considered to be the murder capital of NYC. (Which was interestingly my single most productive canvassing session)
There is of course the issue that within a given neighborhood I am still selecting the wealthier residents. I tried adding in bias term to attempt to account for that (ie I assume I am getting responses from people that are $X above the neghborhood median), but I do agree it’s hard.
I include the collected as well as bayesian estimated demographical data of my responses at the end of the report.
InkSF, an Opening on Finding the Highest Impact in AI Safety and Moving to SF
“Our fair city is poised to allow 6-story buildings citywide by an 8-1 vote. In context that is a huge change. Under the old rules only 350 units (!) total were expected over 15 years and 85%+ of the existing housing wouldn’t have been legal to build. Here’s a primer on the changes. They had to compromise a bit on setbacks and lot size to get it over the finish line, but it still seems great.” I wrote about the specifics of this if anyone wants to hear more/cool data visualization maps.
https://ascendantnewyork.substack.com/p/what-nyc-can-learn-from-cambridge
title: bundling not bunding
I talked to a good friend of mine who is a (I originally had a pretty large descriptor of bonafides here but decided to minimize her exposure surface area). I generally consider her to be very knowledgeable/a very Chinese govt insider point of view, but generally with a Chinese positive slant.
Her claim was that the tepid Shanghai covid response was due to conflicting local vs. national govt priorities. in that the Shanghai govt wanted to coexist with covid due to having high # ICU beds/population and having relatively high vaccination rates. Shanghai only started with stronger measures closer to covid 0 policies after the natl govt exerted pressure.
Re: food, her claim is that China has a national food stockpile to feed the country for 2-3 years, and that while there are short term supply chain issues she isn’t worried about mass starvation happening on any short term time scale. She seems to think that worldwide there is reasonable probability of significant food issues but seems very unconcerned about the local food situation. I was somewhat skeptical of the 2-3 years food stockpile claim but my very quick googling makes it seem like china does have a huge food stockpile
The original search before I went with the references based approach got a couple posts, but I think clearly not enough. I couldn’t figure out a good systematic way to get posts, but I will definitely spend some more time thinking about this and add a tag for LW/AI Alignment posts,
Thanks for the suggestion! Most of https://transformer-circuits.pub/ have an arXiv and got picked up, but it seems like I missed two of them. It looks like I don’t have anything from https://distill.pub/ which I will work on.
Thanks for responding, and feeling strongly enough to make an account to do so! I appreciate the feedback. I think overall a core belief of mine is building the smartest AI on the planet is extremely high stakes and I (at least) would hope and have high standards for what it looks like to steward that into existence. This bias is certainly baked into this essay and I think it’s reasonable that if you (or anyone else) doesn’t have that frame this essay is less strong, since it’s less important really actually get everything right.
I think there’s a couple things going on here that I appreciate the feedback on. There are certainly beliefs of mine that I don’t think I fully justify within the course of this article. I’ve written a little bit (and thought a lot) about the previous version of the RSP and I certainly agree thatto keep risks low, it is not enough to maintain risk mitigations as capabilities increase — rather, we must accelerate our progress on risk mitigations. While we do not see any fundamental barriers to achieving this, success is far from guaranteed.
does not clearly support that belief of mine. I generally try to make it clear when something isn’t fully supported in the essay by saying something like “I believe” but clearly I didn’t do a great job here.
Another thing I struggled with here was how much to write for someone who already has some context vs. really describing everything exactly and fully. I also struggled with this in the ordering of the various points since frequently they are interrelated.
Anthropic just released what is by far the world’s best AI model
I will note that I did make it very clear that it wasn’t a public release and estimated the total number of people I think have access. How bad is it if someone who stops before they have made it 25% of the way through an essay has an incorrect assessment of reality. I certainly agree it’s not optimal, but it seems kind of impossible to fully solve. Perhaps it’s sufficiently important in this case to footnote it immediately—that seems low cost and sufficiently worth it. I’ll do that.
“For those with access, this model is surprisingly uninhibited”
I have a feeling if they RL trained the model with more inhibitions you would probably be saying something like “despite saying that they won’t release it to the public, they are clearly preparing for it, look at the company’s incentives here”.I clearly didn’t do a good enough job here of describing the mechanism. This is not at all a claim about the model itself, but that they are choosing to not include the Classifier-based prompt blocking that is normally included in public releases. I do say this in the very next line after the one you quoted: “Because of the very limited and targeted nature of this release, we are not blocking exchanges based on classifier triggers”, so this feels a little uncharitable. But, also I think if I want to write for non technical people not in this field I can’t expect them to know what classifier-based prompt blocking is. I do try to define this later in the essay, but once again ordering is hard
To be explicit, the decision not to make this model generally available does not stem from Responsible Scaling Policy requirements.
The point I clearly mostly failed to make here is that it can’t be blocked by the RSP. The v3 RSP is specifically designed to not block new releases (unlike previous versions)
I will say overall I am a little sad that this critique doesn’t engage with any of the technical arguments which (at least in my eyes) are where I am most unhappy with the state of this release and what can be inferred about how future releases will look. I also understand why that is the case, technical arguments are much harder to assess and know how important they are if you aren’t in the field.Once again just trying to do my best, and I appreciate the places you’ve spent the time to give feedback on how you read and understood this piece!
Would be nice to be able to have a summary after finishing for each question of what I estimated, what the correct answer was, and points scored.
This might be a reading comprehension problem on my part, but I couldn’t find the objective explicitly stated. Is the objective to maximize the probability of successfully retrieving all 3 fragments? or to maximize the expected number of fragments retrieved?
Have you read https://www.gwern.net/docs/psychology/okcupid/themathematicsofbeauty.html
This blog post by the old okcupid team (before they were bought by matchgroup) seems to be pretty strongly advocating for playing up the unique ways you are attractive, and not trying to play up all the ways you are average possibly to the detriment of your elo.
I certainly have found this to be a top tier resource when thinking about dating app meta and was somewhat surprised to not see it referenced.