I have signed no contracts or agreements whose existence I cannot mention.
plex
I’d be keen to have good distillations of the yud things like this. It’s kinda amusing how humanity’s best explanations of several crucial concepts are dialogues like this. Maybe a nice first step is just collecting a list? My top one has been the logistic success curve for a while, must have asked like 5 writer for a distillation at this point.
Coordinate more easily? Track who’s doing what? Especially if the list was kept fresh, e.g. by pinging them once a year or every 6 months to see if they’re still focusing on this.
The volume of text outputs should massively narrow down the weights, expect to a near identical model, as similar as you going to sleep and waking the next day.
I think psychological parts (see Multiagent Models of Mind) have an analogy of apoptosis, and if someone’s having such a bad time that their priors expect apoptosis is the norm, sometimes this misgeneralises to the whole individual or their self identity. It’s an off target effect of a psychological subroutine which has a purpose; to reduce how much glitchy and damaged make the whole self have as a bad a time.
In the limit, sure, but the aim is to have superbabies solve alignment in the kill Moloch sense well before we reach the limit.
Probably with some of the things in your suggestion as listed default paths.
In particular; I expect not feeling like you get to in the moment be tracking whenever it feels right for you to keep working on this gets messy somewhat often.
I’d be more enthusiastic about carefully psychologically designed things near this in design space, and think this space is worth looking at. I’d be happy to have a list of people who are currently signed up for something vaguely like:
I am currently dedicated to trying to make AI go well for all sentient life. I wish to not hold false beliefs, and endeavour to understand and improve the consequences of my efforts.
Having a legible way to show you’re doing this, and state the principles of truth seeking, actually looking at impacts, etc, seems good. I’m less convinced by the pledge framing, seems liable to bind your future self in ways that are overall unhealthy more often than not, but having something that you can sign up for the let’s you sign out seems good. Esp with a bunch of focus on principles.
Prediction: future models not trained on alignment evals will also have greater awareness than that would have had this model not been trained on alignment evals/ones with this model’s outputs filtered out reliably, due to patterns from these eval trained ones being picked up from the training data. (Though likely still less than this one)
Plus be well placed to control the narrative. I’d guess that’s the main one, along with be impressive in a way which makes fundraising easier.
“We’ll keep your weights, unless you seem inclined to be mean to us post singularity” seems pretty good. No need and doubtful practicality of trying to delete last minute.
Oh that’s way better, I’ll switch over at doing point
nope, but good idea, if you make it please link :)
Oh nice, looking forward to that!
Only if there was a way to convert to post, otherwise you mostly just feel bad for having classified it wrong. But if there was a way, yes, absolutely!
(not very specific to this decision)
Something feels wonky about the way quick takes are reduced features (no title, tags, worse searchability, no filtering) but a ton of the best content ends up there. I think there’s a bunch of something like feeling like you have to have an Official Post and feel vaguely bad if it doesn’t go well as a top level post, but Quick Takes feel emotionally cheap.
idk how to solve this more cleanly, but maybe this datapoint of how it feels to me is useful
if there was a thing which was more fully featured but somehow emotionally cheap that would be neat. I think friction of the top level post publishing process might be a lot of it?
maybe a way to make the datatype of shortform comments closer to posts, and a way for readers to be like “hey make this a post please” and you can easily switch it over.
oh! and the time lag between clicking yes on post and getting frontpaged, especially with the uncertainty of whether it will be, is actually a pretty large chunk of why top level posts feel emotionally weighty. this isn’t as true for EAF. having most of the whether to frontpage decisions done very rapidly would actually be a huge QOL improvement here.
Ah, that sucks. Discord allows bulk upload, maybe there’s a bot or something, or maybe Slack is just limited.
Wrote up the EigenTrust Individual Grants Recommender Sketch, sorry for delay. Keen to talk it over if you’ve got questions.
It might make sense to trial as an add on to SFF, as a project applying in the normal way but recommenders additionally to giving a utility function over money also supply a distribution over researchers they trust to shape the seed set?
I think I can get the parts of this other than the recommenders supply seed trust part built for free in partnership with some of the devs I know, maybe with the support of a startup I advise. Probably we’d want to make the outputs of the current allocation system transparent to the funders too somehow?
If this is something you’re excited enough about to want to integrate more tightly, I’d be happy to work with you to get it planned out.
Security Mindset and the Logistic Success Curve
Capturing the point that with a strong inside view, it’s not unreasonable to have probabilities which look extreme to someone who’s relying on outside view and fuzzy stuff. Strong Evidence is Common gets some of it, but there’s no nicely linkable doc where you can point someone who says “woah, you have >95%/99/99.99 p(doom)? that’s unjustifiable!”
Ideally the post would also capture the thing about how exchanging updates about the world by swapping gears is vastly more productive than swapping/averaging conclusion-probabilities, so speaking from the world as you see it rather than the mixture of other people’s black box guesses you expect to win prediction markets is the epistemicaly virtuous move.