I’m an AGI safety / AI alignment researcher in Boston with a particular focus on brain algorithms. Research Fellow at Astera. See https://sjbyrnes.com/agi.html for a summary of my research and sorted list of writing. Physicist by training. Email: steven.byrnes@gmail.com. Leave me anonymous feedback here. I’m also at: RSS feed , Twitter , Mastodon , Threads , Bluesky , GitHub , Wikipedia , Physics-StackExchange , LinkedIn
Steven Byrnes
Spatial attention as a “tell” for empathetic simulation?
From when I’ve talked with people from industry, they don’t seem at all interested in tracking per-employee performance (e.g. Google isn’t running RCTs on their engineers to increase their coding performance, and estimates for how long projects will take are not tracked & scored).
FWIW Joel Spolsky suggests that people managing software engineers should have detailed schedules, and says big companies have up-to-date schedules, and built a tool to leverage historical data for better schedules. At my old R&D firm, people would frequently make schedules and budgets for projects, and would be held to account if their estimates were bad, and I got a strong impression that seasoned employees tended to get better at making accurate schedules and budgets over time. (A seasoned employee suggested to me a rule-of-thumb for novices, that I should earnestly try to make an accurate schedule, then go through the draft replacing the word “days” with “weeks”, and “weeks” with “months”, etc.) (Of course it’s possible for firms to not be structured such that people get fast and frequent feedback on the accuracy of their schedules and penalties for doing a bad job, in which case they probably won’t get better over time.)
I guess what’s missing is (1) systemizing scheduling so that it’s not a bunch of heuristics in individual people’s heads (might not be possible), (2) intervening on employee workflows etc. (e.g. A/B testing) and seeing how that impacts productivity.
Practice testing
IIUC the final “learning” was assessed via a test. So you could rephrase this as, “if you do the exact thing X, you’re liable to get better at doing X”, where here X=“take a test on topic Y”. (OK, it generalized “from simple recall to short answer inference tests” but that’s really not that different.)
I’m also a little bit surprised that keywords and mnemonics don’t work (since they are used very often by competitive mnemonists)
I invent mnemonics all the time, but normal people still need spaced-repetition or similar to memorize the mnemonic. The mnemonics are easier to remember (that’s the point) but “easier” ≠ effortless.
As another point, I think a theme that repeatedly comes up is that people are much better at learning things when there’s an emotional edge to them—for example:
It’s easier to remember things if you’ve previously brought them up in an argument with someone else.
It’s easier to remember things if you’ve previously gotten them wrong in public and felt embarrassed.
It’s easier to remember things if you’re really invested in and excited by a big project and figuring this thing out will unblock the project.
This general principle makes obvious sense from an evolutionary perspective (it’s worth remembering a lion attack, but it’s not worth remembering every moment of a long uneventful walk), and I think it’s also pretty well understood neuroscientifically (physiological arousal → more norepinephrine, dopamine, and/or acetylcholine → higher learning rates … something like that).
As another point, I’m not sure there’s any difference between “far transfer” and “deep understanding”. Thus, the interventions that you said were helpful for far transfer seem to be identical to the interventions that would lead to deep understanding / familiarity / facility with thinking about some set of ideas. See my comment here.
Yeah some of my to-do items are of the form “skim X”. Inside the “card” I might have a few words about how I originally came across X and what I’m hoping to get out of skimming it.
It just refers to the fact that there are columns that you drag items between. I don’t even really know how a “proper” kanban works.
If a new task occurs to me in the middle of something else, I’ll temporarily put it in a left (high-priority) column, just so I don’t forget it, and then later when I’m at my computer and have a moment to look at it, I might decide to drag it to a right (low-priority) column instead of doing it.
Such an unambitious, narrowly-scoped topic area?? There may be infinitely many parallel universes in which we can acausally improve life … you’re giving up of the value at stake before even starting :)
A couple productivity tips for overthinkers
I always thought of as the exact / “real” definition of entropy, and as the specialization of that “exact” formula to the case where each microstate is equally probable (a case which is rarely exactly true but often a good approximation). So I found it a bit funny that you only mention the second formula, not the first. I guess you were keeping it simple? Or do you not share that perspective?
I just looked up “many minds” and it’s a little bit like what I wrote here, but described differently in ways that I think I don’t like. (It’s possible that Wikipedia is not doing it justice, or that I’m misunderstanding it.) I think minds are what brains do, and I think brains are macroscopic systems that follow the laws of quantum mechanics just like everything else in the universe.
What property distinguished a universe where “Harry found himself in a tails branch” and a universe where “Harry found himself in a heads branch”?
Those both happen in the same universe. Those Harry’s both exist. Maybe you should put aside many-worlds and just think about Parfit’s teletransportation paradox. I think you’re assuming that “thread of subjective experience” is a coherent concept that satisfies all the intuitive properties that we feel like it should have, and I think that the teletransportation paradox is a good illustration that it’s not coherent at all, or at the very least, we should be extraordinarily cautious when making claims about the properties of this alleged thing you call a “thread of subjective experience” or “thread of consciousness”. (See also other Parfit thought experiments along the same lines.)
I don’t like the idea where we talk about what will happen to Harry, as if that has to have a unique answer. Instead I’d rather talk about Harry-moments, where there’s a Harry at a particular time doing particular things and full of memories of what happened in the past. Then there are future Harry-moments. We can go backwards in time from a Harry-moment to a unique (at any given time) past Harry-moment corresponding to it—after all, we can inspect the memories in future-Harry-moment’s head about what past-Harry was doing at that time (assuming there were no weird brain surgeries etc). But we can’t uniquely go in the forward direction: Who’s to say that multiple future-Harry-moments can’t hold true memories of the very same past-Harry-moment?
Here I am, right now, a Steve-moment. I have a lot of direct and indirect evidence of quantum interactions that have happened in the past or are happening right now, as imprinted on my memories, surroundings, and so on. And if you a priori picked some possible property of those interactions that (according to the Born rule) has 1-in-a-googol probability to occur in general, then I would be delighted to bet my life’s savings that this property is not true of my current observations and memories. Obviously that doesn’t mean that it’s literally impossible.
I wrote “flipping an unbiased coin” so that’s 50⁄50.
there’s some preferred future “I” out of many who is defined not only by observations he receives, but also by being a preferred continuation of subjective experience defined by an unknown mechanism
I disagree with this part—if Harry does the quantum equivalent of flipping an unbiased coin, then there’s a branch of the universe’s wavefunction in which Harry sees heads and says “gee, isn’t it interesting that I see heads and not tails, I wonder how that works, hmm why did my thread of subjective experience carry me into the heads branch?”, and there’s also a branch of the universe’s wavefunction in which Harry sees tails and says “gee, isn’t it interesting that I see tails and not heads, I wonder how that works, hmm why did my thread of subjective experience carry me into the tails branch?”. I don’t think either of these Harrys is “preferred”.
I don’t think there’s any extra “complexity penalty” associated with the previous paragraph: the previous paragraph is (I claim) just a straightforward description of what would happen if the universe and everything in it (including Harry) always follows the Schrodinger equation—see Quantum Mechanics In Your Face for details.
I think we deeply disagree about the nature of consciousness, but that’s a whole can of worms that I really don’t want to get into in this comment thread.
doesn’t strike me as “feeling more natural”
Maybe you’re just going for rhetorical flourish, but my specific suggestion with the words “feels more natural” in the context of my comment was: the axiom “I will find myself in a branch of amplitude approaching 0 with probability approaching 0” “feels more natural” than the axiom “I will find myself in a branch of amplitude c with probability ”. That particular sentence was not a comparison of many-worlds with non-many-worlds, but rather a comparison of two ways to formulate many-worlds. So I think your position is that you find neither of those to “feel natural”.
Quantum Mechanics In Your Face talk by Sidney Coleman, starting slide 17 near the end. The basic idea is to try to operationalize how someone might test the Born rule—they take a bunch of quantum measurements, one after another, and they subject their data to a bunch of randomness tests and so on, and then they eventually declare “Born rule seems true” or “Born rule seems false” after analyzing the data. And you can show that the branches in which this person declares “Born rule seems false” have collective amplitude approaching zero, in the limit as their test procedure gets better and better (i.e. as they take more and more measurements).
(Warning that I may well be misunderstanding this post.)
For any well-controlled isolated system, if it starts in a state , then at a later time it will be in state where U is a certain deterministic unitary operator. So far this is indisputable—you can do quantum state tomography, you can measure the interference effects, etc. Right?
OK, so then you say: “Well, a very big well-controlled isolated system could be a box with my friend Harry and his cat in it, and if the same principle holds, then there will be deterministic unitary evolution from into , and hey, I just did the math and it turns out that will have a 50⁄50 mix of ‘Harry sees his cat alive’ and ‘Harry sees his cat dead and is sad’.” This is beyond what’s possible to directly experimentally verify, but I think it should be a very strong presumption by extrapolating from the first paragraph. (As you say, “quantum computers prove larger and larger superpositions to be stable”.)
OK, and then we take one more step by saying “Hey what if I’m in the well-controlled isolated system?” (e.g. the “system” in question is the whole universe). From my perspective, it’s implausible and unjustified to do anything besides say that the same principle holds as above: if the universe (including me) starts in a state , then at a later time it will be in state where U is a deterministic unitary operator.
…And then there’s an indexicality issue, and you need another axiom to resolve it. For example: “as quantum amplitude of a piece of the wavefunction goes to zero, the probability that I will ‘find myself’ in that piece also goes to zero” is one such axiom, and equivalent (it turns out) to the Born rule. It’s another axiom for sure; I just like that particular formulation because it “feels more natural” or something.
I think the place anti-many-worlds-people get off the boat is this last step, because there’s actually two attitudes:
My attitude is: there’s a universe following orderly laws, and the universe was there long before there were any people around to observe it, and it will be there long after we’re gone, and the universe happened to spawn people and now we can try to study and understand it.
An opposing attitude is: the starting point is my first-person subjective mind, looking out into the universe and making predictions about what I’ll see. So my perspective is special—I need not be troubled by the fact that I claim that there are many-Harrys when Harry’s in the box and I’m outside it, but I also claim that there are not many-me’s when I’m in the box. That’s not inconsistent, because I’m the one generating predictions for myself, so the situation isn’t symmetric. If I see that the cat is dead, then the cat is dead, and if you outside the well-isolated box say “there’s a branch of the wavefunction where you saw that the cat’s alive”, then I’ll say “well, from my perspective, that alleged branch is not ‘real’; it does not ‘exist’”. In other words, when I observed the cat, I “collapsed my wavefunction” by erasing the part of the (alleged) wavefunction that is inconsistent with my indexical observations, and then re-normalizing the wavefunction.
I’m really unsympathetic to the second bullet-point attitude, but I don’t think I’ve ever successfully talked somebody out of it, so evidently it’s a pretty deep gap, or at any rate I for one am apparently unable to communicate past it.
maybe the pilot-wave model is directionally correct in the sense of informing us about the nature of knowledge?
FWIW last I heard, nobody has constructed a pilot-wave theory that agrees with quantum field theory (QFT) in general and the standard model of particle physics in particular. The tricky part is that in QFT there’s observable interference between states that have different numbers of particles in them, e.g. a virtual electron can appear then disappear in one branch but not appear at all in another, and those branches have easily-observable interference in collision cross-sections etc. That messes with the pilot-wave formalism, I think.
I think the standard technical term for what you’re talking about is “unsupervised machine translation”. Here’s a paper on that, for example, although it’s not using the LLM approach you propose. (I have no opinion about whether the LLM approach you propose would work or not.)
In practice minds mostly seem to converge on quite similar latents
Yeah to some extent, although it’s stacking the deck when the minds speak the same language and grew up in the same culture. If you instead go to remote tribes, you find plenty of untranslatable words—or more accurately, words that translate to some complicated phrase that you’ve probably never thought about before. (I dug up an example for §4.3 here, in reference to Lisa Feldman Barrett’s extensive chronicling of exotic emotion words from around the world.)
(That’s not necessarily relevant to alignment because we could likewise put AGIs in a training environment with lots of English-language content, and then the AGIs would presumably get English-language concepts.)
“inconsistent beliefs”
You were talking about values and preferences in the previous paragraph, then suddenly switched to “beliefs”. Was that deliberate?
I’m in the market for a new productivity coach / accountability buddy, to chat with periodically (I’ve been doing one ≈20-minute meeting every 2 weeks) about work habits, and set goals, and so on. I’m open to either paying fair market rate, or to a reciprocal arrangement where we trade advice and promises etc. I slightly prefer someone not directly involved in AGI safety/alignment—since that’s my field and I don’t want us to get nerd-sniped into object-level discussions—but whatever, that’s not a hard requirement. You can reply here, or DM or email me. :)update: I’m all set now
Now, a system which doesn’t satisfy the coherence conditions could still maximize some other kind of utility function—e.g. utility over whole trajectories, or some kind of discounted sum of utility at each time-step, rather than utility over end states. But that’s not very interesting, in general; any old system can be interpreted as maximizing some utility function over whole trajectories (i.e. the utility function which assigns high score to whatever the system actually does, and low score to everything else).
It’s probably not intended, but I think this wording vaguely implies a false dichotomy between “a thing (approximately) coherently pursues a long-term goal” and “an uninteresting thing like a rock”. There are other options like “Bob wants to eventually get out of debt, but Bob also wants to always act with honor and integrity”. See my post Consequentialism & Corrigibility.
Relatedly, I don’t think memetics is the only reason humans don’t approximately-coherently pursue states of the world in the distant future. (You didn’t say it was, but sorta gave that vibe.) For one thing, something can be pleasant or unpleasant right now. For another thing, the value function is defined and updated in conjunction with a flawed and incomplete world-model, as in your Pointers Problem post.
I’m interested in Metacelsus’s answer.
My take is: I really haven’t been following the lab leak stuff. The point of my comment was to bring this hypothesis to the attention of people who have, and hopefully get some takes from them. As I understand it:
We know for sure that miners went into a cave, the same cave where btw one of the closest known wild relatives of COVID was later sampled
We know for sure that the miners got sick with COVID-like symptoms, some for 4+ months
We know for sure that samples (including posthumous samples) from those sick miners were sent to WIV, and that the researchers still had access to those samples into 2020
I think that’s more than enough to at least raise the Mojiang Miner Passage theory to consideration. Figuring out whether the theory is actually true or not would require a lot more beyond that, e.g. arguments about the exact genetic code of the furin cleavage site and all this other stuff which is way outside my area of expertise. :)
[genetic sequence analysis] is stupid because none of the people involved had the technical understanding required to even interpret papers on the topic.
The two judges were:
Will van Treuren, a pharmaceutical entrepreneur with a PhD from Stanford and a background in bacteriology and immunology.
Eric Stansifer, an applied mathematician with a PhD from MIT and experience in mathematical virology.
Do you think the judges lack technical understanding to interpret papers on genetic sequence analysis, or do you not count the judges as “involved”, or both, or something else?
Way back in 2020 there was an article A Proposed Origin For SARS-COV-2 and the COVID-19 Pandemic, which I read after George Church tweeted it (!) (without comment or explanation). Their proposal (they call it “Mojiang Miner Passage” theory) in brief was that it WAS a lab leak but NOT gain-of-function. Rather, in April 2012, six workers in a “Mojiang mine fell ill from a mystery illness while removing bat faeces. Three of the six subsequently died.” Their symptoms were a perfect match to COVID, and two were very sick for more than four months.
The proposal is that the virus spent those four months adapting to life in human lungs, including (presumably) evolving the furin cleavage site. And then (this is also well-documented) samples from these miners were sent to WIV. The proposed theory is that those samples sat in a freezer at WIV for a few years while WIV was constructing some new lab facilities, and then in 2019 researchers pulled out those samples for study and infected themselves.
I like that theory! I’ve liked it ever since 2020! It seems to explain many of the contradictions brought up by both sides of this debate—it’s compatible with Saar’s claim that the furin cleavage site is very different from what’s in nature and seems specifically adapted to humans, but it’s also compatible with Peter’s claim that the furin cleavage site looks weird and evolved. It’s compatible with Saar’s claim that WIV is suspiciously close to the source of the outbreak, but it’s also compatible with Peter’s claim that WIV might not have been set up to do serious GoF experiments. It’s compatible with the data comparing COVID to other previously-known viruses (supposedly). Etc.
Old as this theory is, the authors are still pushing it and they claim that it’s consistent with all the evidence that’s come out since then (see author’s blog). But I’m sure not remotely an expert, and would be interested if anyone has opinions about this. I’m still confused why it’s never been much discussed.
If I’m looking up at the clouds, or at a distant mountain range, then everything is far away (the ground could be cut off from my field-of-view)—but it doesn’t trigger the sensations of fear-of-heights, right? Also, I think blind people can be scared of heights?
Another possible fear-of-heights story just occurred to me—I added to the post in a footnote, along with why I don’t believe it.