I have signed no contracts or agreements whose existence I cannot mention.
plex
Consider booking calls with gurkenglas, he has the highest ratio of (knows math*alignment theory*has high G) to (how cheap his time is) of probably anyone on the planet, due to executive function issues. https://calendly.com/gurkenglas/consultation
He’s especially interested in category theory and wants to make a periodic table of math.
(This goes for anyone reading this post, gurkenglas wants more people to book free calls where he can debug their math and code)
Yup, I put a high quality interpretability pipeline that the AI systems can use on themselves as one of the most likely things to be the proximal cause of game over.
Having a child probably brings online lots of protectiveness drives. I don’t think I would enjoy feeling helpless to defend my recently born child from misaligned superintelligence, especially knowing that what little I can do to avert their death and that of everyone else I know is much harder now that I have to take care of a child.
Excited to be a parent post singularity when I can give them a safe and healthy environment, and have a print-out of https://www.smbc-comics.com/comic/2013-09-08 to remind myself of this.
Strong endorse on this general project. Currently working on doing this for convergent consequentialism with @Mateusz Bagiński @Leon Lang and Anna Magpie.
Yes. It’s an inferred fuzzy correlation based on past experience, the entanglement between the future and present is not necessarily very strong. More capable agents are able to see across wider domains, further, and more reliably, than weaker agents.
The thing that’s happening is not a direct window to the future opening, but cognitive work letting you map the causal structure of the future and create an approximation of their patterns in the present. You’re mapping the future so you can act differently depending on what’s there, which does let the logical shape of the future affect the present, but only to a degree compatible with your ability to predict the future.
Fixed chat links thanks to @the gears to ascension. (fun note, Claude has dramatically better takes than ChatGPT on this)
I got this mostly from talking with the author of https://ouroboros.cafe/articles/land, who referenced xenosystems fragments.
There’s a non trivial conceptual clarification / deconfusion gained by FFS on top of the summary you made there. I put decent odds on this clarification being necessary for some approaches to strongly scalable technical alignment.
This comment looks to me like you’re missing the main insight of finite factored sets. Suggest reading https://www.lesswrong.com/posts/PfcQguFpT8CDHcozj/finite-factored-sets-in-pictures-6 and some of the other posts, maybe https://www.lesswrong.com/posts/N5Jm6Nj4HkNKySA5Z/finite-factored-sets and https://www.lesswrong.com/posts/qhsELHzAHFebRJE59/a-greater-than-b-greater-than-a until it makes sense why a bunch of clearly competent people thought this was an important contribution.
One of the comments you linked has an edit showing they updated towards this position.
This is a non-trivial insight and reframe, and I’m not going to try and write a better explanation than Scott and Magdalena. But, if you take the time to get it and respond with clear understanding of the frame I’m open to taking a shot at answering stuff.
Appendix: Five Worlds of Orthogonality
How much of a problem Pythia is depends on how strongly the Orthogonality Thesis holds.
All goals are equally easy to design an agent to pursue, beyond the inherent tractability of that goal.[1]
There can exist arbitrarily intelligent agents pursuing any kind of goal.
Agents do not tend to factorize into an Orthogonal value-like component and a Diagonal belief-like component; rather, there are Oblique components that do not factorize neatly.
All sufficiently advanced systems converge towards maximizing intelligence/power/influence/self-evidencing, shredding all their other values in the process.
Universalist moral internalism
What is right must be universally motivating so all sufficiently advanced AI systems discover objective moral truth and do Good Things. (Maybe it takes them a while to converge)
Pythia
Security Mindset and the Logistic Success Curve
…look, at some point in life we have to try to triage our efforts and give up on what can’t be salvaged. There’s often a logistic curve for success probabilities, you know? The distances are measured in multiplicative odds, not additive percentage points. You can’t take a project like this and assume that by putting in some more hard work, you can increase the absolute chance of success by 10%. More like, the odds of this project’s failure versus success start out as 1,000,000:1, and if we’re very polite and navigate around Mr. Topaz’s sense that he is higher-status than us and manage to explain a few tips to him without ever sounding like we think we know something he doesn’t, we can quintuple his chances of success and send the odds to 200,000:1. Which is to say that in the world of percentage points, the odds go from 0.0% to 0.0%. That’s one way to look at the “law of continued failure”.
If you had the kind of project where the fundamentals implied, say, a 15% chance of success, you’d then be on the right part of the logistic curve, and in that case it could make a lot of sense to hunt for ways to bump that up to a 30% or 80% chance.
Capturing the point that with a strong inside view, it’s not unreasonable to have probabilities which look extreme to someone who’s relying on outside view and fuzzy stuff. Strong Evidence is Common gets some of it, but there’s no nicely linkable doc where you can point someone who says “woah, you have >95%/99/99.99 p(doom)? that’s unjustifiable!”
Ideally the post would also capture the thing about how exchanging updates about the world by swapping gears is vastly more productive than swapping/averaging conclusion-probabilities, so speaking from the world as you see it rather than the mixture of other people’s black box guesses you expect to win prediction markets is the epistemicaly virtuous move.
I’d be keen to have good distillations of the yud things like this. It’s kinda amusing how humanity’s best explanations of several crucial concepts are dialogues like this. Maybe a nice first step is just collecting a list? My top one has been the logistic success curve for a while, must have asked like 5 writer for a distillation at this point.
Coordinate more easily? Track who’s doing what? Especially if the list was kept fresh, e.g. by pinging them once a year or every 6 months to see if they’re still focusing on this.
The volume of text outputs should massively narrow down the weights, expect to a near identical model, as similar as you going to sleep and waking the next day.
I think psychological parts (see Multiagent Models of Mind) have an analogy of apoptosis, and if someone’s having such a bad time that their priors expect apoptosis is the norm, sometimes this misgeneralises to the whole individual or their self identity. It’s an off target effect of a psychological subroutine which has a purpose; to reduce how much glitchy and damaged make the whole self have as a bad a time.
In the limit, sure, but the aim is to have superbabies solve alignment in the kill Moloch sense well before we reach the limit.
Probably with some of the things in your suggestion as listed default paths.
In particular; I expect not feeling like you get to in the moment be tracking whenever it feels right for you to keep working on this gets messy somewhat often.
I’d be more enthusiastic about carefully psychologically designed things near this in design space, and think this space is worth looking at. I’d be happy to have a list of people who are currently signed up for something vaguely like:
I am currently dedicated to trying to make AI go well for all sentient life. I wish to not hold false beliefs, and endeavour to understand and improve the consequences of my efforts.
Spoofing a DNS redirect record with the router which sends you to a homograh domain with a legitimate certificate should work.