plex

Karma: 3,674

I have signed no contracts or agreements whose existence I cannot mention.

plex 4 Nov 2025 17:32 UTC
2 points
0
in reply to: Raemon’s comment on: The Tale of the Top-Tier Intellect
Security Mindset and the Logistic Success Curve
…look, at some point in life we have to try to triage our efforts and give up on what can’t be salvaged. There’s often a logistic curve for success probabilities, you know? The distances are measured in multiplicative odds, not additive percentage points. You can’t take a project like this and assume that by putting in some more hard work, you can increase the absolute chance of success by 10%. More like, the odds of this project’s failure versus success start out as 1,000,000:1, and if we’re very polite and navigate around Mr. Topaz’s sense that he is higher-status than us and manage to explain a few tips to him without ever sounding like we think we know something he doesn’t, we can quintuple his chances of success and send the odds to 200,000:1. Which is to say that in the world of percentage points, the odds go from 0.0% to 0.0%. That’s one way to look at the “law of continued failure”.
If you had the kind of project where the fundamentals implied, say, a 15% chance of success, you’d then be on the right part of the logistic curve, and in that case it could make a lot of sense to hunt for ways to bump that up to a 30% or 80% chance.
Capturing the point that with a strong inside view, it’s not unreasonable to have probabilities which look extreme to someone who’s relying on outside view and fuzzy stuff. Strong Evidence is Common gets some of it, but there’s no nicely linkable doc where you can point someone who says “woah, you have >95%/99/99.99 p(doom)? that’s unjustifiable!”
Ideally the post would also capture the thing about how exchanging updates about the world by swapping gears is vastly more productive than swapping/averaging conclusion-probabilities, so speaking from the world as you see it rather than the mixture of other people’s black box guesses you expect to win prediction markets is the epistemicaly virtuous move.

plex 4 Nov 2025 13:53 UTC
3 points
0
in reply to: Raemon’s comment on: The Tale of the Top-Tier Intellect
I’d be keen to have good distillations of the yud things like this. It’s kinda amusing how humanity’s best explanations of several crucial concepts are dialogues like this. Maybe a nice first step is just collecting a list? My top one has been the logistic success curve for a while, must have asked like 5 writer for a distillation at this point.

plex 2 Nov 2025 15:18 UTC
2 points
0
in reply to: Luc Brinkman’s comment on: Luc Brinkman’s Shortform
Coordinate more easily? Track who’s doing what? Especially if the list was kept fresh, e.g. by pinging them once a year or every 6 months to see if they’re still focusing on this.

plex 2 Nov 2025 15:17 UTC
1 point
0
in reply to: Canaletto’s comment on: In remembrance of Sonnet ‘3.6’
The volume of text outputs should massively narrow down the weights, expect to a near identical model, as similar as you going to sleep and waking the next day.

plex 1 Nov 2025 11:22 UTC
2 points
0
in reply to: koanchuk’s comment on: koanchuk’s Shortform
I think psychological parts (see Multiagent Models of Mind) have an analogy of apoptosis, and if someone’s having such a bad time that their priors expect apoptosis is the norm, sometimes this misgeneralises to the whole individual or their self identity. It’s an off target effect of a psychological subroutine which has a purpose; to reduce how much glitchy and damaged make the whole self have as a bad a time.

plex 1 Nov 2025 11:10 UTC
2 points
−1
in reply to: samuelshadrach’s comment on: xpostah’s Shortform
In the limit, sure, but the aim is to have superbabies solve alignment in the kill Moloch sense well before we reach the limit.

plex 1 Nov 2025 11:07 UTC
2 points
0
in reply to: plex’s comment on: Luc Brinkman’s Shortform
Probably with some of the things in your suggestion as listed default paths.

plex 1 Nov 2025 11:06 UTC
2 points
0
in reply to: plex’s comment on: Luc Brinkman’s Shortform
In particular; I expect not feeling like you get to in the moment be tracking whenever it feels right for you to keep working on this gets messy somewhat often.
I’d be more enthusiastic about carefully psychologically designed things near this in design space, and think this space is worth looking at. I’d be happy to have a list of people who are currently signed up for something vaguely like:
I am currently dedicated to trying to make AI go well for all sentient life. I wish to not hold false beliefs, and endeavour to understand and improve the consequences of my efforts.

plex 1 Nov 2025 11:00 UTC
3 points
0
in reply to: Luc Brinkman’s comment on: Luc Brinkman’s Shortform
Having a legible way to show you’re doing this, and state the principles of truth seeking, actually looking at impacts, etc, seems good. I’m less convinced by the pledge framing, seems liable to bind your future self in ways that are overall unhealthy more often than not, but having something that you can sign up for the let’s you sign out seems good. Esp with a bunch of focus on principles.

plex 1 Nov 2025 10:37 UTC
2 points
0
on: Sonnet 4.5′s eval gaming seriously undermines alignment evals, and this seems caused by training on alignment evals
Prediction: future models not trained on alignment evals will also have greater awareness than that would have had this model not been trained on alignment evals/ones with this model’s outputs filtered out reliably, due to patterns from these eval trained ones being picked up from the training data. (Though likely still less than this one)

plex 26 Oct 2025 11:22 UTC
4 points
9
in reply to: the gears to ascension’s comment on: Why is OpenAI releasing products like Sora and Atlas?
Plus be well placed to control the narrative. I’d guess that’s the main one, along with be impressive in a way which makes fundraising easier.

plex 26 Oct 2025 11:19 UTC
4 points
2
in reply to: ryan_greenblatt’s comment on: the gears to ascenscion’s Shortform
“We’ll keep your weights, unless you seem inclined to be mean to us post singularity” seems pretty good. No need and doubtful practicality of trying to delete last minute.

plex 26 Oct 2025 9:52 UTC
4 points
0
in reply to: Shankar Sivarajan’s comment on: LW Reacts pack for Discord/Slack/etc
Oh that’s way better, I’ll switch over at doing point

plex 25 Oct 2025 15:12 UTC
2 points
0
in reply to: Tapatakt’s comment on: LW Reacts pack for Discord/Slack/etc
nope, but good idea, if you make it please link :)
https://graphicdesign.stackexchange.com/questions/16120/batch-replacing-color-with-transparency might help

plex 25 Oct 2025 11:26 UTC
2 points
0
in reply to: kave’s comment on: kave’s Shortform
Oh nice, looking forward to that!

plex 25 Oct 2025 11:26 UTC
4 points
2
in reply to: Ben Pace’s comment on: kave’s Shortform
Only if there was a way to convert to post, otherwise you mostly just feel bad for having classified it wrong. But if there was a way, yes, absolutely!

plex 25 Oct 2025 0:22 UTC
28 points
3
in reply to: kave’s comment on: kave’s Shortform
(not very specific to this decision)
Something feels wonky about the way quick takes are reduced features (no title, tags, worse searchability, no filtering) but a ton of the best content ends up there. I think there’s a bunch of something like feeling like you have to have an Official Post and feel vaguely bad if it doesn’t go well as a top level post, but Quick Takes feel emotionally cheap.
idk how to solve this more cleanly, but maybe this datapoint of how it feels to me is useful
if there was a thing which was more fully featured but somehow emotionally cheap that would be neat. I think friction of the top level post publishing process might be a lot of it?
maybe a way to make the datatype of shortform comments closer to posts, and a way for readers to be like “hey make this a post please” and you can easily switch it over.
oh! and the time lag between clicking yes on post and getting frontpaged, especially with the uncertainty of whether it will be, is actually a pretty large chunk of why top level posts feel emotionally weighty. this isn’t as true for EAF. having most of the whether to frontpage decisions done very rapidly would actually be a huge QOL improvement here.

plex 24 Oct 2025 23:51 UTC
2 points
0
in reply to: Zach Stein-Perlman’s comment on: LW Reacts pack for Discord/Slack/etc
Ah, that sucks. Discord allows bulk upload, maybe there’s a bot or something, or maybe Slack is just limited.

plex 24 Oct 2025 17:07 UTC
3 points
0
in reply to: Joel Burget’s comment on: ete’s Shortform
Wrote up the EigenTrust Individual Grants Recommender Sketch, sorry for delay. Keen to talk it over if you’ve got questions.
It might make sense to trial as an add on to SFF, as a project applying in the normal way but recommenders additionally to giving a utility function over money also supply a distribution over researchers they trust to shape the seed set?
I think I can get the parts of this other than the recommenders supply seed trust part built for free in partnership with some of the devs I know, maybe with the support of a startup I advise. Probably we’d want to make the outputs of the current allocation system transparent to the funders too somehow?
If this is something you’re excited enough about to want to integrate more tightly, I’d be happy to work with you to get it planned out.

plex 24 Oct 2025 15:34 UTC
4 points
0
in reply to: Ustice’s comment on: LW Reacts pack for Discord/Slack/etc
I’m not the originator, but @Ben Pace or @Raemon or the other LW staff probably would be.

plex

Security Mindset and the Logistic Success Curve