Born too late to explore Earth; born too early to explore the galaxy; born just the right time to save humanity.
Ulisse Mini
Thanks! I will definitely read those!
Thanks! Some other people recommended the Atlas Fellowship and I’ve applied. Regarding (9) I think I worded it badly, I meant reach out to local politicians (I thought the terms were interchangeable)
Read it, that study guide is really good, really motivates me to branch out since I’ve definitely overfocused on depth before and not done enough applications/”generalizing”
This also reminds me of Miyamoto Musashi’s 3rd principle: Become acquainted with every art
Noted
Yeah I read about 1/3d of the proof of Cox’s theorem until I realized even if I followed every step I wouldn’t gain any intuition from it, then I skipped the rest
Some realizations about memory and learning I’ve been thinking about recently EDIT: here are some great posts on memory which are a deconfused version of this shortform (and written by EY’s wife!)
Anki (and SRS in general) is a tool for efficiently writing directed graph edges to the brain. thinking about encoding knowledge as a directed graph can help with making good Anki cards.
Memory techniques are somewhat-analogous to data structures as well, e.g. the link method corresponds to a doubly linked list
“Memory techniques” should be called “Memory principles” (or even laws).
The “Code is Data” concept makes me realize memorization is more widely applicable, you could e.g. memorize the algorithm for integration in calculus. Many “creative” processes like integration can be reduced to an algorithm.
Truely part of you is not orthogonal to memory
techniquesprinciples, it uses the fact that a densely connected graph is less likely to be disconnected from randomly deleting edges, similar to how the link and story methods. Just because you aren’t making silly images doesn’t mean you aren’t using the principles.(untested idea for math): Journal about your thought processes after solving each problem, then generalize to form a problem solving algorithm / checklist and memorize the algorithm
= finding shortest paths on a weighted directed graph, where the shortest path cost must be below some threshold :)
I’ll write some posts when I get stuff working, I feel a Sense That More Is Possible in this area, but I don’t want to write stuff till I can at least say it works well for me.
Upvoted because I think there should be more of a discussion around this then “Obviously getting normal people involved will only make things worse” (which seems kind of arrogant / assumes there are no good unknown unknowns)
Yes, I’m not convinced either way myself but here are some arguments against:
If the USA regulates AGI, China will get it first which seems worse as there’s less alignment-activity in China (as for US China coordination, lol, lmao)
Raising awareness of AGI Alignment also raises awareness of AGI. If we communicate the “AGI” part without the “Alignment” part we could speed up timelines
If there’s a massive influx of funding/interest from people who aren’t well informed, it could lead to “substitution hazards” like work on aligning weak models with methods that don’t scale to the superintelligent case (In climate change people substitute “solve climate change” to “I’ll reduce my own emissions” which is useless)
If we convince the public AGI is a threat, there could be widespread flailing (the bad kind) which reflects badly on Alignment researchers (e.g. if DeepMind researchers are receiving threats, their system 1 might generalize to “People worried about AGI are a doomsday cult and should be disregarded”)
Most of these I’ve heard from reading conversations on EleutherAI’s discord, Connor is typically the most pessimistic but some others are pessimistic too (Connor’s talk discusses substitution hazards in more detail)
TLDR: It’s hard to control the public once they’re involved. Climate change startups aren’t getting public funding, the public is more interested in virtue-signaling (In the climate case the public doesn’t really make things worse, but for AGI it could be different)
EDIT: I think I’ve presented the arguments badly, re-reading them I don’t find them convincing. You should seek out someone who presents them better.
Personally my process goes something like:
Click a citation/link on LW that sends me to a sequence post
Read the post, opening any interesting citations in new tabs
Repeat until I run out of time or run out of interesting citations (the latter never happens)
People Power
To get a sense that more is possible consider
The AI box experiment, and its replication
Mentalists like Derren Brown (which is related to 1)
How the FBI gets hostages back with zero leverage (they aren’t allowed to pay ransoms)
(This is an excerpt from a post I’m writing which I may or may not publish. the link aggregation here might be useful in of itself)
It would be nice to be able to change 5 minutes to something else, I know this isn’t in the spirit of the “try harder luke”, but 5 minutes is arbitrary, it could just as easily have been 10 minutes.
Interesting, I’m homeschooled (unschooled specifically) and that probably benefited my agency (though I could still be much more agentic). I guess parenting styles matter a lot more then surface level “going to school”
You’re super brave for sharing this, it’s hard to stand up and say “Yes I’m the stereotypical example of the problem mentioned here”, stay optimistic though; people starting lower have risen higher.
Those who take delight in their own might are merely pretenders to power. The true warrior of fate needs no adoration or fear, no tricks or overwhelming effort; he need not be stronger or smarter or innately more capable than everyone else; he need not even admit it to himself. All he needs to do is to stand there, at that moment when all hope is dead, and look upon the abyss without flinching.
I think even without point #4 you don’t necessarily get an AI maximizing diamonds. Heuristically, it feels to me like you’re bulldozing open problems without understanding them (e.g. ontology identification by training with multiple models of physics, getting it not to reward-hack by explicit training, etc.) all of which are vulnerable to a deceptively aligned model (just wait till you’re out of training to reward-hack). Also, every time you say “train it by X so it learns Y” you’re assuming alignment (e.g. “digital worlds where the sub-atomic physics is different, such that it learns to preserve the diamond-configuration despite ontological confusion”)
IMO shard theory provides a great frame to think about this in, it’s a must-read for improving alignment intuitions.
If the title is meant to be a summary of the post, I think that would be analogous to someone saying “nuclear forces provide an untapped wealth of energy”. It’s true, but the reason the energy is untapped is because nobody has come up with a good way of tapping into it.
The difference is people have been trying hard to harness nuclear forces for energy, while people have not been trying hard to research humans for alignment in the same way. Even relative to the size of the alignment field being far smaller, there hasn’t been a real effort as far as I can see. Most people immediately respond with “AGI is different from humans for X,Y,Z reasons” (which are true) and then proceed to throw out the baby with the bathwater by not looking into human value formation at all.
Planes don’t fly like birds, but we sure as hell studied birds to make them.
If you come up with a strategy for how to do this then I’m much more interested, and that’s a big reason why I’m asking for a summary since I think you might have tried to express something like this in the post that I’m missing.
This is their current research direction, The shard theory of human values which they’re currently making posts on.
I can’t speak for Alex and Quintin, but I think if you were able to figure out how values like “caring about other humans” or generalizations like “caring about all sentient life” formed for you from hard-coded reward signals that would be useful. Maybe ask on the shard theory discord, also read their document if you haven’t already, maybe you’ll come up with your own research ideas.
Alignment researchers have given up on aligning an AI with human values, it’s too hard! Human values are ill-defined, changing, and complicated things which they have no good proxy for. Humans don’t even agree on all their values!
Instead, the researchers decide to align their AI with the simpler goal of “creating as many paperclips as possible”. If the world is going to end, why not have it end in a funny way?
Sadly it wasn’t so easy, the first prototype of Clippy grew addicted to watching YouTube videos of paperclip unboxing, and the second prototype hacked its camera feed replacing it with an infinite scrolling of paperclips. Clippy doesn’t seem to care about paper clips in the real world.
How can the researchers make Clippy care about the real world? (and preferably real-world paperclips too)
This is basically the diamond-maximizer problem. in my opinion, the “preciseness” we can specify diamonds at is a red herring. At the quantum level or below what counts as a diamond could start to get fuzzy
KL-divergence and map territory distinction
Crosspost from my blog
The cross-entropy is defined as the expected surprise when drawing from , which we’re modeling as . Our map is while is the territory.
Now it should be intuitively clear that because an imperfect model will (on average) surprise us more than the perfect model .
To measure unnecessary surprise from approximating by we define
This is KL-divergence! The average additional surprise from our map approximating the territory.
Now it’s time for an exercise, in the following figure is the Gaussian that minimizes or , can you tell which is which?
Left is minimizing while the right is minimizing .
Reason as follows:
If is the territory then the left is a better map (of ) than the right .
If is the map, then the territory on the right leads to us being less surprised than the territory on the left, because on the on left will be very surprised at data in the middle, despite it being likely according to the territory .
On the left we fit the map to the territory, on the right we fit the territory to the map.
In regard to priorities between young frontline workers and the at-risk elderly. I hope they’re optimizing for saving life-years, and not lives (ie. if a healthy 20yo has 60yrs ahead of them, and a healthy 70yo has 10yrs ahead of them, saving the 20yo saves 6x as many life-years)
Other than that interesting post, I’ll be keeping an eye on that new strain.