I’m an AGI safety / AI alignment researcher in Boston with a particular focus on brain algorithms. Research Fellow at Astera. See https://sjbyrnes.com/agi.html for a summary of my research and sorted list of writing. Physicist by training. Email: steven.byrnes@gmail.com. Leave me anonymous feedback here. I’m also at: RSS feed , Twitter , Mastodon , Threads , Bluesky , GitHub , Wikipedia , Physics-StackExchange , LinkedIn
Steven Byrnes
I appreciate the brainstorming prompt but I can’t come up with anything useful here. The things you mention are related to cortex lesions, which would presumably leave the brainstem spatial attention system intact. (Brainstem damage is more rare and often lethal.) The stuff you say about neglect is fun to think about but I can’t see situations where there would be specifically-social consequences, in a way that sheds light on what’s happening.
There might be something to the fact that the temporoparietal junction (TPJ) seems to include areas related to spatial attention, and is also somehow involved in theory-of-mind tasks. I’ve been looking into that recently—in fact, that’s part of the story of how I came to write this post. I still don’t fully understand the TPJ though.
Hmm, there do exist lesion studies related to theory-of-mind, e.g. this one—I guess I should read them.
I think I would feel characteristic innate-fear-of-heights sensations (fear + tingly sensation for me, YMMV) if I were standing on an opaque bridge over a chasm, especially if the wood is cracking and about to break. Or if I were near the edge of a roof with no railings, but couldn’t actually see down.
Neither of these claims is straightforward rock-solid proof that the thing you said is wrong, because there’s a possible elaboration of what you said that starts with “looking down” as ground truth and then generalizes that ground truth via pattern-matching / learning algorithm—but I still think that elaborated story doesn’t hang together when you work through it in detail, and that my “innate ‘center of spatial attention’ constantly darting around local 3D space” story is much better.
If I’m looking up at the clouds, or at a distant mountain range, then everything is far away (the ground could be cut off from my field-of-view)—but it doesn’t trigger the sensations of fear-of-heights, right? Also, I think blind people can be scared of heights?
Another possible fear-of-heights story just occurred to me—I added to the post in a footnote, along with why I don’t believe it.
Spatial attention as a “tell” for empathetic simulation?
From when I’ve talked with people from industry, they don’t seem at all interested in tracking per-employee performance (e.g. Google isn’t running RCTs on their engineers to increase their coding performance, and estimates for how long projects will take are not tracked & scored).
FWIW Joel Spolsky suggests that people managing software engineers should have detailed schedules, and says big companies have up-to-date schedules, and built a tool to leverage historical data for better schedules. At my old R&D firm, people would frequently make schedules and budgets for projects, and would be held to account if their estimates were bad, and I got a strong impression that seasoned employees tended to get better at making accurate schedules and budgets over time. (A seasoned employee suggested to me a rule-of-thumb for novices, that I should earnestly try to make an accurate schedule, then go through the draft replacing the word “days” with “weeks”, and “weeks” with “months”, etc.) (Of course it’s possible for firms to not be structured such that people get fast and frequent feedback on the accuracy of their schedules and penalties for doing a bad job, in which case they probably won’t get better over time.)
I guess what’s missing is (1) systemizing scheduling so that it’s not a bunch of heuristics in individual people’s heads (might not be possible), (2) intervening on employee workflows etc. (e.g. A/B testing) and seeing how that impacts productivity.
Practice testing
IIUC the final “learning” was assessed via a test. So you could rephrase this as, “if you do the exact thing X, you’re liable to get better at doing X”, where here X=“take a test on topic Y”. (OK, it generalized “from simple recall to short answer inference tests” but that’s really not that different.)
I’m also a little bit surprised that keywords and mnemonics don’t work (since they are used very often by competitive mnemonists)
I invent mnemonics all the time, but normal people still need spaced-repetition or similar to memorize the mnemonic. The mnemonics are easier to remember (that’s the point) but “easier” ≠ effortless.
As another point, I think a theme that repeatedly comes up is that people are much better at learning things when there’s an emotional edge to them—for example:
It’s easier to remember things if you’ve previously brought them up in an argument with someone else.
It’s easier to remember things if you’ve previously gotten them wrong in public and felt embarrassed.
It’s easier to remember things if you’re really invested in and excited by a big project and figuring this thing out will unblock the project.
This general principle makes obvious sense from an evolutionary perspective (it’s worth remembering a lion attack, but it’s not worth remembering every moment of a long uneventful walk), and I think it’s also pretty well understood neuroscientifically (physiological arousal → more norepinephrine, dopamine, and/or acetylcholine → higher learning rates … something like that).
As another point, I’m not sure there’s any difference between “far transfer” and “deep understanding”. Thus, the interventions that you said were helpful for far transfer seem to be identical to the interventions that would lead to deep understanding / familiarity / facility with thinking about some set of ideas. See my comment here.
Yeah some of my to-do items are of the form “skim X”. Inside the “card” I might have a few words about how I originally came across X and what I’m hoping to get out of skimming it.
It just refers to the fact that there are columns that you drag items between. I don’t even really know how a “proper” kanban works.
If a new task occurs to me in the middle of something else, I’ll temporarily put it in a left (high-priority) column, just so I don’t forget it, and then later when I’m at my computer and have a moment to look at it, I might decide to drag it to a right (low-priority) column instead of doing it.
Such an unambitious, narrowly-scoped topic area?? There may be infinitely many parallel universes in which we can acausally improve life … you’re giving up of the value at stake before even starting :)
A couple productivity tips for overthinkers
I always thought of as the exact / “real” definition of entropy, and as the specialization of that “exact” formula to the case where each microstate is equally probable (a case which is rarely exactly true but often a good approximation). So I found it a bit funny that you only mention the second formula, not the first. I guess you were keeping it simple? Or do you not share that perspective?
I just looked up “many minds” and it’s a little bit like what I wrote here, but described differently in ways that I think I don’t like. (It’s possible that Wikipedia is not doing it justice, or that I’m misunderstanding it.) I think minds are what brains do, and I think brains are macroscopic systems that follow the laws of quantum mechanics just like everything else in the universe.
What property distinguished a universe where “Harry found himself in a tails branch” and a universe where “Harry found himself in a heads branch”?
Those both happen in the same universe. Those Harry’s both exist. Maybe you should put aside many-worlds and just think about Parfit’s teletransportation paradox. I think you’re assuming that “thread of subjective experience” is a coherent concept that satisfies all the intuitive properties that we feel like it should have, and I think that the teletransportation paradox is a good illustration that it’s not coherent at all, or at the very least, we should be extraordinarily cautious when making claims about the properties of this alleged thing you call a “thread of subjective experience” or “thread of consciousness”. (See also other Parfit thought experiments along the same lines.)
I don’t like the idea where we talk about what will happen to Harry, as if that has to have a unique answer. Instead I’d rather talk about Harry-moments, where there’s a Harry at a particular time doing particular things and full of memories of what happened in the past. Then there are future Harry-moments. We can go backwards in time from a Harry-moment to a unique (at any given time) past Harry-moment corresponding to it—after all, we can inspect the memories in future-Harry-moment’s head about what past-Harry was doing at that time (assuming there were no weird brain surgeries etc). But we can’t uniquely go in the forward direction: Who’s to say that multiple future-Harry-moments can’t hold true memories of the very same past-Harry-moment?
Here I am, right now, a Steve-moment. I have a lot of direct and indirect evidence of quantum interactions that have happened in the past or are happening right now, as imprinted on my memories, surroundings, and so on. And if you a priori picked some possible property of those interactions that (according to the Born rule) has 1-in-a-googol probability to occur in general, then I would be delighted to bet my life’s savings that this property is not true of my current observations and memories. Obviously that doesn’t mean that it’s literally impossible.
I wrote “flipping an unbiased coin” so that’s 50⁄50.
there’s some preferred future “I” out of many who is defined not only by observations he receives, but also by being a preferred continuation of subjective experience defined by an unknown mechanism
I disagree with this part—if Harry does the quantum equivalent of flipping an unbiased coin, then there’s a branch of the universe’s wavefunction in which Harry sees heads and says “gee, isn’t it interesting that I see heads and not tails, I wonder how that works, hmm why did my thread of subjective experience carry me into the heads branch?”, and there’s also a branch of the universe’s wavefunction in which Harry sees tails and says “gee, isn’t it interesting that I see tails and not heads, I wonder how that works, hmm why did my thread of subjective experience carry me into the tails branch?”. I don’t think either of these Harrys is “preferred”.
I don’t think there’s any extra “complexity penalty” associated with the previous paragraph: the previous paragraph is (I claim) just a straightforward description of what would happen if the universe and everything in it (including Harry) always follows the Schrodinger equation—see Quantum Mechanics In Your Face for details.
I think we deeply disagree about the nature of consciousness, but that’s a whole can of worms that I really don’t want to get into in this comment thread.
doesn’t strike me as “feeling more natural”
Maybe you’re just going for rhetorical flourish, but my specific suggestion with the words “feels more natural” in the context of my comment was: the axiom “I will find myself in a branch of amplitude approaching 0 with probability approaching 0” “feels more natural” than the axiom “I will find myself in a branch of amplitude c with probability ”. That particular sentence was not a comparison of many-worlds with non-many-worlds, but rather a comparison of two ways to formulate many-worlds. So I think your position is that you find neither of those to “feel natural”.
Quantum Mechanics In Your Face talk by Sidney Coleman, starting slide 17 near the end. The basic idea is to try to operationalize how someone might test the Born rule—they take a bunch of quantum measurements, one after another, and they subject their data to a bunch of randomness tests and so on, and then they eventually declare “Born rule seems true” or “Born rule seems false” after analyzing the data. And you can show that the branches in which this person declares “Born rule seems false” have collective amplitude approaching zero, in the limit as their test procedure gets better and better (i.e. as they take more and more measurements).
(Warning that I may well be misunderstanding this post.)
For any well-controlled isolated system, if it starts in a state , then at a later time it will be in state where U is a certain deterministic unitary operator. So far this is indisputable—you can do quantum state tomography, you can measure the interference effects, etc. Right?
OK, so then you say: “Well, a very big well-controlled isolated system could be a box with my friend Harry and his cat in it, and if the same principle holds, then there will be deterministic unitary evolution from into , and hey, I just did the math and it turns out that will have a 50⁄50 mix of ‘Harry sees his cat alive’ and ‘Harry sees his cat dead and is sad’.” This is beyond what’s possible to directly experimentally verify, but I think it should be a very strong presumption by extrapolating from the first paragraph. (As you say, “quantum computers prove larger and larger superpositions to be stable”.)
OK, and then we take one more step by saying “Hey what if I’m in the well-controlled isolated system?” (e.g. the “system” in question is the whole universe). From my perspective, it’s implausible and unjustified to do anything besides say that the same principle holds as above: if the universe (including me) starts in a state , then at a later time it will be in state where U is a deterministic unitary operator.
…And then there’s an indexicality issue, and you need another axiom to resolve it. For example: “as quantum amplitude of a piece of the wavefunction goes to zero, the probability that I will ‘find myself’ in that piece also goes to zero” is one such axiom, and equivalent (it turns out) to the Born rule. It’s another axiom for sure; I just like that particular formulation because it “feels more natural” or something.
I think the place anti-many-worlds-people get off the boat is this last step, because there’s actually two attitudes:
My attitude is: there’s a universe following orderly laws, and the universe was there long before there were any people around to observe it, and it will be there long after we’re gone, and the universe happened to spawn people and now we can try to study and understand it.
An opposing attitude is: the starting point is my first-person subjective mind, looking out into the universe and making predictions about what I’ll see. So my perspective is special—I need not be troubled by the fact that I claim that there are many-Harrys when Harry’s in the box and I’m outside it, but I also claim that there are not many-me’s when I’m in the box. That’s not inconsistent, because I’m the one generating predictions for myself, so the situation isn’t symmetric. If I see that the cat is dead, then the cat is dead, and if you outside the well-isolated box say “there’s a branch of the wavefunction where you saw that the cat’s alive”, then I’ll say “well, from my perspective, that alleged branch is not ‘real’; it does not ‘exist’”. In other words, when I observed the cat, I “collapsed my wavefunction” by erasing the part of the (alleged) wavefunction that is inconsistent with my indexical observations, and then re-normalizing the wavefunction.
I’m really unsympathetic to the second bullet-point attitude, but I don’t think I’ve ever successfully talked somebody out of it, so evidently it’s a pretty deep gap, or at any rate I for one am apparently unable to communicate past it.
maybe the pilot-wave model is directionally correct in the sense of informing us about the nature of knowledge?
FWIW last I heard, nobody has constructed a pilot-wave theory that agrees with quantum field theory (QFT) in general and the standard model of particle physics in particular. The tricky part is that in QFT there’s observable interference between states that have different numbers of particles in them, e.g. a virtual electron can appear then disappear in one branch but not appear at all in another, and those branches have easily-observable interference in collision cross-sections etc. That messes with the pilot-wave formalism, I think.
I think the standard technical term for what you’re talking about is “unsupervised machine translation”. Here’s a paper on that, for example, although it’s not using the LLM approach you propose. (I have no opinion about whether the LLM approach you propose would work or not.)
In practice minds mostly seem to converge on quite similar latents
Yeah to some extent, although it’s stacking the deck when the minds speak the same language and grew up in the same culture. If you instead go to remote tribes, you find plenty of untranslatable words—or more accurately, words that translate to some complicated phrase that you’ve probably never thought about before. (I dug up an example for §4.3 here, in reference to Lisa Feldman Barrett’s extensive chronicling of exotic emotion words from around the world.)
(That’s not necessarily relevant to alignment because we could likewise put AGIs in a training environment with lots of English-language content, and then the AGIs would presumably get English-language concepts.)
“inconsistent beliefs”
You were talking about values and preferences in the previous paragraph, then suddenly switched to “beliefs”. Was that deliberate?
I’m in the market for a new productivity coach / accountability buddy, to chat with periodically (I’ve been doing one ≈20-minute meeting every 2 weeks) about work habits, and set goals, and so on. I’m open to either paying fair market rate, or to a reciprocal arrangement where we trade advice and promises etc. I slightly prefer someone not directly involved in AGI safety/alignment—since that’s my field and I don’t want us to get nerd-sniped into object-level discussions—but whatever, that’s not a hard requirement. You can reply here, or DM or email me. :)update: I’m all set now
Now, a system which doesn’t satisfy the coherence conditions could still maximize some other kind of utility function—e.g. utility over whole trajectories, or some kind of discounted sum of utility at each time-step, rather than utility over end states. But that’s not very interesting, in general; any old system can be interpreted as maximizing some utility function over whole trajectories (i.e. the utility function which assigns high score to whatever the system actually does, and low score to everything else).
It’s probably not intended, but I think this wording vaguely implies a false dichotomy between “a thing (approximately) coherently pursues a long-term goal” and “an uninteresting thing like a rock”. There are other options like “Bob wants to eventually get out of debt, but Bob also wants to always act with honor and integrity”. See my post Consequentialism & Corrigibility.
Relatedly, I don’t think memetics is the only reason humans don’t approximately-coherently pursue states of the world in the distant future. (You didn’t say it was, but sorta gave that vibe.) For one thing, something can be pleasant or unpleasant right now. For another thing, the value function is defined and updated in conjunction with a flawed and incomplete world-model, as in your Pointers Problem post.
Yeah, you can have something which is “a brilliant out-of-the-box solution to a tricky problem” from the AI’s perspective, but is “reward-hacking / Goodharting the value function” from the programmer’s perspective. You say tomato, I say to-mah-to.
It’s tricky because there’s economic pressure to make AIs that will find and execute brilliant out-of-the-box solutions. But we want our AIs to think outside of some of the boxes (e.g. yes you can repurpose a spare server rack frame for makeshift cable guides), but we want it to definitely stay inside other boxes (e.g. no you can’t take over the world). Unfortunately, the whole idea of “think outside the box” is that we’re not aware of all the boxes that we’re thinking inside of.
All three of those examples are of the form “hey here’s a lot of samples from a distribution, please output another sample from the same distribution”, which is not the kind of problem where anyone would ever expect adversarial dynamics / weird edge-cases, right?
(…Unless you do conditional sampling of a learned distribution, where you constrain the samples to be in a specific a-priori-extremely-unlikely subspace, in which case sampling becomes isomorphic to optimization in theory. (Because you can sample from the distribution of (reward, trajectory) pairs conditional on high reward.))
Or maybe you were making a different point in this particular paragraph?