I’m a PhD student at the University of Amsterdam. I have research experience in multivariate information theory and equivariant deep learning and recently got very interested into AI alignment. https://langleon.github.io/
[Appendix] Natural Abstractions: Key Claims, Theorems, and Critiques
Natural Abstractions: Key claims, Theorems, and Critiques
For what it’s worth, I think this comment seems clearly right to me, even if one thinks the post actually shows misalignment. I’m confused about the downvotes of this (5 net downvotes and 12 net disagree votes as of writing this).
Now to answer our big question from the previous section: I can find some satisfying the conditions exactly when all of the ’s are independent given the “perfectly redundant” information. In that case, I just set to be exactly the quantities conserved under the resampling process, i.e. the perfectly redundant information itself.
In the original post on redundant information, I didn’t find a definition for the “quantities conserved under the resampling process”. You name this F(X) in that post.
Just to be sure: is your claim that if F(X) exists that contains exactly the conserved quantities and nothing else, then you can define like this? Or is the claim even stronger and you think such can always be constructed?
Edit: Flagging that I now think this comment is confused. One can simply define as the conditional, which is a composition of the random variable and the function
When I converse with junior folks about what qualities they’re missing, they often focus on things like “not being smart enough” or “not being a genius” or “not having a PhD.” It’s interesting to notice differences between what junior folks think they’re missing & what mentors think they’re missing.
There may also be social reasons to give different answers depending on whether you are a mentor or mentee. I.e., answering “the better mentees were those who were smarter” seems like an uncomfortable thing to say, even if it’s true.
(I do not want to say that this social explanation is the only reason that answers between mentors and mentees differed. But I do think that one should take it into account in one’s models)
Then is a projection matrix, projecting into the span.
To clarify: for this, you probably need the basis to be orthonormal?
Disagreements often focus on outputs even though underlying models produced those.
Double Crux idea: focus on the models!
Double Crux tries to reveal the different underlying beliefs coming from different perspectives on reality
Good Faith Principle:
Assume that the other side is moral and intelligent.
Even if some actors are bad, you minimize the chance of error if you start with the prior that each new person is acting in good faith
For every belief A, there are usually beliefs B, C, D such that their believed truth supports belief A
These are “cruxes” if them not being true would shake the belief in A.
Ideally, B, C, and D are functional models of how the world works and can be empirically investigated
If you know your crux(es), investigating it has the chance to change your belief in A
In Search of more productive disagreement
Often, people obscure their cruxes by telling many supporting reasons, most of which aren’t their true crux.
This makes it hard for the “opponent” to know where to focus
If both parties search for truth instead of wanting to win, you can speed up the process a lot by telling each other the cruxes
Playing Double Crux
Lower the bar: instead of reaching a shared belief, find a shared testable claim that, if investigated, would resolve the disagreement.
Double Crux: A belief that is a crux for you and your conversation partner, i.e.:
You believe A, the partner believes not A.
You believe testable claim B, the partner believes not B.
B is a crux of your belief in A and not B is a crux of your partner’s belief in not B.
Investigating conclusively whether B is true may resolve the disagreement (if the cruxes were comprehensive enough)
The Double Crux Algorithm
Find a disagreement with another person (This might also be about different confidences in beliefs)
Operationalize the disagreement (Avoid semantic confusions, be specific)
Seek double cruxes (Seek cruxes independently and then compare)
Resonate (Do the cruxes really feel crucial? Think of what would change if you believed your crux to be false)
Repeat (Are there underlying easier-to-test cruxes for the double cruxes themselves?)
In this post, John starts with a very basic intuition: that abstractions are things you can get from many places in the world, which are therefore very redundant. Thus, for finding abstractions, you should first define redundant information: Concretely, for a system of n random variables X1, …, Xn, he defines the redundant information as that information that remains about the original after repeatedly resampling one variable at a time while keeping all the others fixed. Since there will not be any remaining information if n is finite, there is also the somewhat vague assumption that the number of variables goes to infinity in that resampling process.
The first main theorem says that this resampling process will not break the graphical structure of the original variables, i.e., if X1, …, Xn form a Markov random field or Bayesian network with respect to a graph, then the resampled variables will as well, even when conditioning on the abstraction of them. John’s interpretation is that you will still be able to make inferences about the world in a local way even if you condition on your high-level understanding (i.e., the information preserved by the resampling process)
The second main theorem applies this to show that any abstraction F(X1, …, Xn) that contains all the information remaining from the resampling process will also contain all the abstract summaries from the telephone theorem for all the ways that X1, …, Xn (with n going to infinity) could be decomposed into infinitely many nested Markov blankets. This makes F a supposedly quite powerful abstraction.
It’s fairly unclear how exactly the resampling process should be defined. If n is finite and fixed, then John writes that no information will remain. If, however, n is infinite from the start, then we should (probably?) expect the mutual information between the original random variable and the end result to also often be infinite, which also means that we should not expect a small abstract summary F.
Leaving that aside, it is in general not clear to me how F is obtained. The second theorem just assumes F and deduces that it contains the information from the abstract summaries of all telephone theorems. The hope is that F is low-dimensional and thus manageable. But no attempt is made to show the existence of a low-dimensional F in any realistic setting.
Another remark: I don’t quite understand what it means to resample one of the variables “in the physical world”. My understanding is as follows, and if anyone can correct it, that would be helpful: We have some “prior understanding” (= prior probability) about how the world works, and by measuring aspects in the world — e.g., patches full of atoms in a gear — we gain “data” from that prior probability distribution. When forgetting the data of one of the patches, we can look at the others and then use our physical understanding to predict the values for the lost patch. We then sample from that prediction.
Is that it? If so, then this resampling process seems very observer-dependent since there is probably no actual randomness in the universe. But if it is observer-dependent, then the resulting abstractions would also be observer-dependent, which seems to undermine the hope to obtain natural abstractions.
I also have a similar concern about the pencils example: if you have a prior on variables X1, …, Xn and you know that all of them will end up to be “objects of the same type”, and a joint sample of them gives you n pencils, then it makes sense to me that resampling them one by one until infinity will still give you a bunch of pencil-like objects, leading you to conclude that the underlying preserved information is a graphite core inside wood. However, where do the variables X1, …, Xn come from in the first place? Each Xi is already a high-level object and it is unclear to me what the analysis would look like if one reparameterized that space. (Thanks to Erik Jenner for mentioning that thought to me; there is a chance that I misrepresent his thinking, though.)
Goal: Find motivation through truth-seeking rather than coercion or self-deception
Ideally: the urges are aligned with the high-level goals
Turn “wanting to want” into “want”
If a person has simultaneously conflicting beliefs and desires, then one of those is wrong.
[Comment from myself: I find this, as stated, not evidently true since desires often do not have a “ground truth” due to the orthogonality thesis. However, even if there is a conflict between subsystems, the productive way forward is usually to find a common path in a values handshake. This is how I interpret conflicting desires to be “wrong”]
If you call some urges “lazy”, then you spend energy on a conflict
If you ignore your urges, then part of you is not “focused” on the activity, making it less worthwhile
Acknowledge your conflicting desires: “I have a belief that it’s good to run and I have a belief that it’s good to watch Netflix”
The different parts aren’t right or wrong; they have tunnel vision, not seeing the value of the other desire
Shoulds: When there is a default action, there is often a sense that you “should” have done something else. If you would have done this “something else”, then the default action becomes the “should” and the situation is reversed.
View shoulds as “data” that is useful for making better conclusions
The IDC Algorithm (with an example in the article)
Recommendation: Do not tweak the structure of IDC before having tried it a few times
Step 0: Find an internal disagreement
Identify a “should” that’s counter to a default action
Step 1: Draw two dots on a piece of paper and name them with the subagents representing the different positions
Choose appropriate names/handles that don’t favor one side over the other
Step 2: Decide who speaks first (it might be the side with more “urgency”)
Say one thing embodied from that perspective
Maybe use Focusing to check that the words resonate
Step 3: Get the other side to acknowledge truth.
Let it find something true in the statement or derived from it
Step 4: The second side also adds “one thing”
Be open in general about the means of communication of the sides; they may also scribble something, express a feeling, or …
Step 5: Repeat
It’s okay for some sides to “blow off steam” once in a while and not follow the rules; if so, correct that after the fact from a “moderation standpoint”
You may write down “moderator interjections” with another color
Eventually, you might realize the disagreement to be about something else.
This can give clarity on the “internal generators” of conflict
If so, start a new piece of paper with two new debaters
Ideally, the different parts understand each other better, leading them to stop getting into conflict since they respect each other’s values
Focusing is a technique for bringing subconscious system 1 information into conscious awareness
Felt sense: a feeling in the body that is not yet verbalized but may subconsciously influence behavior, and which carries meaning.
The dominant factor in patient outcomes: does the patient remain uncertain, instead of having firm narratives?
A goal of therapy is increased awareness and clarity. Thus, it is not useful to spend much time in the already known.
The successful patient thinks and listens to information
If the verbal part utters something, the patient will check with the felt senses to correct the utterance
Listening can feel like “having something on the tip of your tongue”
From felt senses to handles
A felt sense is like a picture
There’s lots of tacit, non-explicit information in it
A handle is like a sketch of the picture that is true to it.
Handles “resonate” with the felt sense
The first attempt at a handle will often not resonate — then you need to iterate
In the end, you might get a “click”, “release of pressure”, or “sense of deep rightness”
The felt sense can change or disappear once “System 2 got the message”
Advice and caveats
The felt sense may also not be true — your system 1 may be biased.
Choosing a topic: if you don’t have a felt sense to focus on, produce the utterance “Everything in my life is perfect right now” and see how system 1 responds. This will usually create a topic to focus on
Get physically comfortable
Don’t “focus” in the sense of effortful attention, but “focus” in the sense of “increase clarity”
Hold space: don’t go super fast or “push”; silence in one’s mind is normal
Stay with one felt sense at a time
Always return to the felt sense, also if the coherent verbalized story feels “exciting”
Don’t limit yourself to sensations in your body — there are other felt senses
Try saying things out loud (both utterances and questions “to the felt sense”)
Try to not “fall into” overwhelming felt senses; they can sometimes make the feeling a “subject” instead of an “object” to hold and talk with
Going “meta” and asking what the body has to say about a felt sense can help with not getting sucked in
Verbalizing “I feel rage” and then “something in me is feeling rage” etc. can progressively create distance to felt senses
The Focusing Algorithm
Select something to bring into focus
Create space (get physically comfortable and drop in for a minute; Put attention to the body; ask sensations to wait if there are multiple; go meta if you’re overwhelmed)
Look for a handle of the felt sense (Iterate between verbalizing and listening until the felt sense agrees; Ask questions to the felt sense; Take time to wait for responses)
Some things seem complicated/difficult and hard to do. You may also have uncertainties about whether you can achieve it
E.g.: running a marathon; or improving motivation
The resolve cycle: set a 5-minute timer and just solve the thing.
Why does it work?
You want to be actually trying, even when there is no immediate need.
Resolve cycles are an easy way of achieving that (They make it more likely and less painful to invest effort)
One mechanism of its success: when asking “Am I ready for this?” the answer is often “No”. But when there’s a five-minute timer, the answer becomes “Yes”, no matter how hard the problem itself may seem.
Choose a thing to solve — it can be small or big.
Try solving it with a 5 minutes timer. Don’t defer parts of it to the future unless they are extremely easy (i.e., there is a TAP in place ensuring that they will get done)
If it’s not solved: spend another 5 minutes brainstorming a list of five-minute actions to solve your problem
Do one of the five-minute actions to get momentum
Developing a “grimoire”:
This section lists many questions to ask yourself in a resolve-cycle. These questions can help for generating useful ideas.
Andrew Huberman on How to Optimize Sleep
Note: I found this article in particular a bit hard to summarize, especially the section “The argument for CoZE”. I find it hard to say what exactly it is telling me, and how it relates to the later sections.
Comfort is a lack of pain, discomfort, negative emotions, fear, and anxiety, …
Comfort often comes from experience
There’s a gray area between comfort and discomfort that can be worth exploring
Should you exploit the current hill and climb higher there, or search for a new one?
Problem: there is inherent uncertainty
Exploration is risky for individuals; there is a strong bias toward known paths
Argument for CoZE
Try Things model — cheap, non-destabilizing experiments
The text contains a list of questions that can generate a lot of the “things we might try”.
Problem: We might be uncomfortable with them.
How do we reason through them but also bring System 1 on board, which might have useful insights?
In the story, someone destroys an ugly fence, only to then be attacked by an animal behind.
Don’t destroy a barrier before you know exactly why it’s there.
CoZE includes the lessons of Chesterton’s fence
That’s why it’s called exploration instead of expansion
This is the difference to exposure therapy
When exploring the area around the fence, remain alert, attentive, receptive:
Stay open to all outcomes: the fence shouldn’t be there, the fence is exactly where it should be, the fence should be further away or even closer to you…
Choose an experience to explore (outside of the current action space, or somewhat blocked, maybe with a yum factor)
Prepare to accept all worlds: both possibilities need to feel comfortable in your imagination
Devise an experiment to “taste” the experience
Try the experiment (Potentially with help of others)
How do the body and mind react?
How does the external world respond?
Digest the experience
Compare the experience to expectations
Should you continue trying something like this? Do not force yourself
Give your system 1 space
Idea: use knowledge of how physiology influences the mind
Stress → Mental shutdown (trouble thinking, making decisions, …)
This leads to decisions that feel correct at the moment but are obviously flawed in hindsight
Metacognitive failure: Part of what we lose is also our ability to notice the loss in abilities → Need objective “sobriety test”
The automatic nervous system
Sympathetic nervous system (SNS): accelerator; fight/flight/freeze, excitement
Parasympathetic nervous system (PSNS): brakes; chill, open vulnerability, reflective…
Relative arousal of these systems matters
The qualities come bundled: not often do you have an “open, relaxed body posture” while also being extremely excited
The ancestral need for survival guides our SNS-dominated responses, and they are not reflective
Changing the state
Check where you are on the SNS-PSNS spectrum
Change your position at that spectrum “at will”
Most of the skill lies in shifting toward PSNS, which is also more useful for our rationality
Algorithm for moving toward PSNS:
Notice that you are SNS-dominated (TAP: physical sign, perceived hostility, someone saying “calm down”)
Open your Body posture
Take low, slow, deep breaths: belly-dominated, exhale longer than inhale
Get aware of the sensations in your feet
Then expand that awareness to the whole body
Take another low, slow, deep breath. Enjoy
The againstness-resolving technique is one mechanism to resolve metacognitive blindspots (where our metacognition is not able to reflect on our cognition)
if everyone sounds wrong/stupid/malicious etc., take seriously the idea that it’s actually you.
Get external objective evidence on your blindspot (e.g., sobriety test for driving, noticing SNS-state)
Assess your blindspots with the same tools you assess other’s blindspots
I share your confusion.
Systemizing has large up-front costs for diminished repeated costs
Not everything needs systemization: maybe you like your inefficient process, or the costs are too infrequent to bother.
Common routines: waking up, meals, work routines, computer, social
Familiar spaces: bedroom, bathroom, kitchen, living room, vehicle, workspace, backpack
Shoulds/Obligations: Physical health, finances, intellectual growth, close relationship, career, emotional well-being, community
Framing: instead of trying to “do everything”, focus on “freeing up attention”
Systems as “personal assistants”
Consider having several smaller systems instead of one overarching framework
Increasing Marginal Returns: attention is a prerequisite for flow, deep connection, and non-default actions
The returns are increasing since removing the first distraction is not worth much in a sea of distractions, but the last is
TAP-interactions: Instead of changing responses, improve the triggers
Qualities of good attention-saving systems: Effortless, reliable (otherwise it’s not attention-freeing), invisible (the triggers should be specific, and not always be visible)
Advice for getting started:
Pay attention to friction (e.g., by collecting them for a week)
Set external reminders
Establish a routine
Shape other’s expectations regarding communication
Eliminate unneeded communication, e.g., email newsletters
Use checklists (e.g. for travel or income tax)
Outsource routines (housekeeping) or routine building (e.g., personal trainer)
Triage (can a task also not be done?)
Systems don’t work from the start, and also not forever. Solution:
Systemize the implementation/adaptation of your systems. Options:
Regular (e.g., weekly) time
Irregular, based on a running list of system-problems
Immediately when problems arise
Choose an aspect to be systemized (e.g., costs attention, is effortful, and can be improved)
Think of attention sinks and ideas to address them (maybe: aversion-factoring, understanding the cause of the distraction)
Reality check (with Myrphyjitsu, effort assessment, common sense objections)
Make a plan (set up TAPs, changes in the environment, buy things)
Put the plan into action (start right now or create the next action; have a review system in place)
Claim 1: Goodhart’s Law is true
“Any measure which becomes the target ceases to be a good measure”
Any math test supposed to find the best students will cease to work at the 10th iteration — people then “study to be good at the test”
Sugar was a good proxy for healthy food in the ancestral environment, but not today
Claim 2: If you want to condition yourself to a certain behavior with some reward, then that’s possible if only the delay between behavior and reward is small enough
Claim 3: Over time, we develop “taste”: inexplicable judgments of whether some stimulus may lead to progress toward our goals or not.
A “stimulus” can be as complex as “this specific hypothesis for how to investigate a disease”
Claim 4: Our Brains condition us, often without us noticing
With this, the article just means that dopamine spikes don’t exactly occur at the low-level reward, but already at points that predictably will lead to reward.
Since the dopamine hit itself can “feel rewarding”, this is a certain type of conditioning towards the behavior that preceded it.
In other words, the brain gives a dopamine hit in the same way as the dog trainer produces the “click” before the “treat”.
We often don’t “notice” this since we don’t usually explicitly think about why something feels good.
Conclusion: Your brain conditions you all the time toward proxy goals (“dopamine hits”), and Goodhart’s law means that conditioning is sometimes wrong
E.g., if you get an “anti-dopamine hit” for seeing the number on your bathroom scale, then this may condition you toward never looking at that number ever again, instead of the high-level goal of losing weight
An abstraction of a high-dimensional random variable X is a low-dimensional summary G(X) that can be used to make predictions about X. In the case that X is sampled from some parameterized distribution P(X | theta), G(X) may take the form of a sufficient statistic, i.e., a function of X such that P(theta | X) = P(theta | G(X)). To make predictions about X, one may then determine theta from P(theta | G(X)), and predict a new data point X from theta.
In this post, John shows that if you have a very low-dimensional sufficient statistic G(X), then in many cases, X will “almost follow” the exponential family form and thus be fairly easy to deal with.
I don’t yet quite understand what John wants to use the theorem for, but I will hopefully learn this in his project update post. My current guess is that he would like to identify abstractions as sufficient statistics of exponential families and that this might be a data structure that is “easier to identify” in the world and in trained machine learning models than our initially “broad guess” for what abstractions could be.
Note that I’m only writing this down to have a prediction to update once I’m reading John’s update post. This seems strictly more useful than not having a prediction at all, even though I don’t place a high chance on my prediction actually being fully correct/nuanced enough.
Another thought is that I feel slightly uneasy about the viewpoint that abstractions are the thing we use to make “predictions about X”. In reality, if a person is in a particular state (meaning that the person is represented by an extremely high-dimensional sample vector X), then to make predictions, I definitely only use a very low-dimensional summary based on my sense of the person’s body part positions and coarse brain state. However, these predictions are not about X: I don’t use the summary to sample vectors that are consistent with the summary; instead, I use the summary to make predictions about summaries themselves. I.e., what will be the person’s mental state and body positions a moment from now, and how will this impact abstractions of other objects in the vicinity of that person? There should then be a “commutative diagram” relating objects in reality to their low-dimensional abstractions, and real-world state transitions to predictions.
I hope to eventually learn more about how this abstraction work feeds into such questions.
Has this already been posted? I could not find the post.