LessWrong team member / moderator. I’ve been a LessWrong organizer since 2011, with roughly equal focus on the cultural, practical and intellectual aspects of the community. My first project was creating the Secular Solstice and helping groups across the world run their own version of it. More recently I’ve been interested in improving my own epistemic standards and helping others to do so as well.
Raemon
Curated. This seemed like a (relatively) straightforward thing to check that seems straightforwardly useful. I’m interested in seeing METR run a version of this against their existing task suite.
There’s always a bit of a double-edged sword of making a good capabilities eval because, even if people don’t have direct access to the eval to iterate against, it implicitly becomes a target. (i.e. it’s hard to tell, but I get some sense of companies striving to hit the METR trendline and beat the other companies on it).
My understanding is the METR task suite is basically saturated. You could probably construct a good version of this that is less saturated less quickly.
I’m wondering if there’s any way to keep this artificially low while making CoT time horizons high, and if there’s some sort of index you could publish that’s, like, ratio of CoT-time-horizon to non-CoT or something. I think for it to be that real/helpful you’d also want some kind of ”...and the CoT is faithful” metric that I don’t currently know of a robust solution for. (This is not a very well thought out idea, just musing)
Good luck!
Huh, curious for you model of why you predicted the other-which-way? This seemed like a classic “does better on LW” kinda post (for good or for ill). Although I wouldn’t have predicted so much disparity.
Past me definitely would have been frustrated about this right along with you, and somehow I have become a stodgy grownup villain from Peter Pan and I’m not sure how/why.
(I think it’s, like, to my detriment, I have less fun now. But, doesn’t seem like I can fix it by just trying to do more fun silly things, they just don’t resonate like they used to. I recall going to one of your laser-tag things and thinking ‘man, I really should be enjoying this but it feels meh for some reason’)
It’s plausible this is more like a muscle I need to rebuild.
I’m a little worried Anthropic has missed the window for this option, since now it might look like the Whitehouse was out to destroy them and they were just throwing in the towel.
(since they clearly don’t care about overrefusals).
(this particular claim here seems false/overstated. Like, clearly, overall, they are willing to accept overrefusals. That doesn’t meant they “don’t care about them”. Maybe they don’t, but, much more likely it just seems like a reasonable tradeoff to them.)
This is presumably not relevant anymore, but.… can you not just turn off memory?
Curated. In addition to the obvious “notice the difference between ‘actually helps with x-risk’ and ‘is x-risk themed’”, I think there’s an important corollary to “you’re asking for someone to make you a sucker.”
The problem isn’t just you might personally get exploited, it’s that you’re incentivizing an overall breeding ground for grifters. (Which could be both intentional grifters getting free/cheap labor, and people with just kinda bad taste destroying the signal/noise ratio of the ecosystem). See: The Moral Obligation Not to Get Eaten.
We do try to subsidize non-AI posts when Curating while keeping quality high.
Curated.
When I first opened this post, I skimmed the intro, and thought “this is a cute idea but seems crazy and I don’t believe it can work. Mnemonics for 19,000 genes? No way!”, and I closed the tab.
When I saw it got 200 karma I took a second look. And… well okay this still seems kinda crazy and I’d like to see someone else use the browser extension and see if it actually helps.
But, I read the “gender = protein transmembrane status” bit, and had a sort of sinking feeling I was about to wrong and a rising feeling of excitement, that, it doesn’t actually take that many dimensions/gradations to get enough bits to specify one guy within 19,000. And the dimensions do seem like categories that leverage my human-racial-bonus-to-identifying-people.
It may just be a cute idea. But I feel like I learned a potentially generalizable tool for mnemonics. I don’t particularly need to memorize protein-coding-genes. But, this has me vaguely excited to try and memorize something. :P
Oh, this was probably because I forgot to put in a street address. Looks fixed now.
My version of this is “don’t try to come up with a name until after you’ve found the venue, because the venue will have some kind of character that lends itself to some names better than others.”
One could go with Persepolis Vibes:
Huh. The combo of this + OpenAI’s Frontier Safety Blueprint in the same week is surprising.
They both are saying more of the things I’d have been wishing for Anthropic and OpenAI to be saying all along. I’d like to just believe “oh, they wanted to say these things all along, and were waiting till the political winds felt favorable.” But, man I feel like if that was the case, at least some of their previous actions would have been different somehow. (OpenAI in particular).
Curated. I’d thought about most of these in isolation before, but found it valuable to have them in one place while sizing up “what actually matters most about cybersecurity in the AI era?.” (Tracking multiple concerns but keeping your eye on the most-important-balls seems like a good habit).
Being able to pin this down exactly is kind of an open research question. (i.e. “what is an agent?” and “what is an optimizer”). But, roughly, things are “more optimizer-like” the more they successfully converge on a target no matter what starting conditions you put them in, and no matter what obstacles you throw in their way.
Some previous posts:
LLM base models in their raw form are less “optimizer-y” because, while clearly intelligent, if you change their initial prompt they will end up doing radically different things instead of converging to the same thing. (compared to AlphaGo which always tries to win games of go).
Curious if you can say more about the diff. Also, is this true across karma scores? (i.e. does the rest of the voting population seem to vaguely agree/disagree with you?)
(strong upvoted because I’d also like an answer to this question)
I don’t think this particular one is about Bay culture. Or, like, Bay Culture might be the sum-of-the-parts here, but, it’s more like I disagree fractally with you both aesthetically and logistical-preferencelly. I enjoyed the humming thing the very first time it happened to me because it’s beautiful and warm. It sounds like you don’t find it beautiful and warm, just annoying.
Have you actually events and successfully quieted 100 people via the “talk to them all individually?” way?
I think the triggeredness is a bit about “musicalness is important to me”, but also like this is disrespecting my time/effort as an organizer, and the vibe I’m trying to create when I’m running an event.
When I imagine trying to do this via talking to individually people it doesn’t just feel “it’d take longer”, it’s more like “I don’t think that would even work.” Everyone would keep talking loudly. Eventually when I’m actually ready to start I’d still need to do something loud and obnoxious to get everyone to stop and people would still keep talking and I’d have to keep doing something loud and obnoxious until they all became silent.
The event organizer is usually frazzled and busy. The humming thing takes like… 8-12 seconds? I don’t think it even really takes more time for each person than it would to talk to each person?
And, when I imagine the reason people being upset about it being because they wanted to keep a conversation going, I’m like “but, you aren’t supposed to be having that conversation anymore, that’s the point. This room is now about whatever-the-next-activity is.” (Generally it’s known that there’s an activity that’s about to start when people do the humming thing).

Fwiw the first one wasn’t rejected for “being raw/sloppy”, we just have particularly high standards for AI content because we get so much of it and we want to keep signal/noise quality high. And both the writing and idea quality need to be actively good.
I think it’s an achievable goal to learn to come up with interesting/meaningful contributions and articulate them well. You can ask AIs for meta-level advice on how to write without having them do your writing for you.