This model does allow for that. :) We can use this model whenever our two agents agree predictively about some parts of the world X; it’s totally fine if our two agents learned their models from different sources and/or make different predictions about other parts of the world.
johnswentworth
Why Care About Natural Latents?
You are a scholar and a gentleman.
Way back in the halcyon days of 2005, a company called Cenqua had an April Fools’ Day announcement for a product called Commentator: an AI tool which would comment your code (with, um, adjustable settings for usefulness). I’m wondering if (1) anybody can find an archived version of the page (the original seems to be gone), and (2) if there’s now a clear market leader for that particular product niche, but for real.
The word for a drug that causes loss of memory is “amnestic”, not “amnesic”. The word “amnesic” is a variant spelling of “amnesiac”, which is the person who takes the drug.
Thanks! Fixed now.
Oh yeah, I guess that could be a learning effect. When reading it I assumed the lack of need for repeating the numbers was just because the drug was wearing off.
Another class of applications which we discussed at the retreat: person 1 takes the amnestic, person 2 shares private information with them, and then person 1 gives their reaction to the private information. Can be used e.g. for complex negotiations: maybe it is in our mutual best interest to make some deal, but in order for me to know that I’d need some information which you don’t want to share with me, so I take the drug, you share the information, and I record some verified record of myself saying “dear future self, you should in fact take this deal”.
… which is cool in theory but I would guess not of high immediate value in practice, which is why the post didn’t focus on it.
I would love to hear suggestions for other things I could try. If you have any, let me know in a comment!
Some Experiments I’d Like Someone To Try With An Amnestic
Do you know what the drug was which did this?
Nitpick: you’re talking about the discovery of the structure of DNA; it was already known at that time to be the particle which mediates inheritance IIRC.
I buy this argument.
I buy this argument.
I don’t buy mathematical equivalence as an argument against, in this case, since the whole point of the path integral formulation is that it’s mathematically equivalent but far simpler conceptually and computationally.
Man, that top one was a mess. Fixed now, thank you!
Here are some candidates from Claude and Gemini (Claude Opus seemed considerably better than Gemini Pro for this task). Unfortunately they are quite unreliable: I’ve already removed many examples from this list which I already knew to have multiple independent discoverers (like e.g. CRISPR and general relativity). If you’re familiar with the history of any of these enough to say that they clearly were/weren’t very counterfactual, please leave a comment.
Noether’s Theorem
Mendel’s Laws of Inheritance
Godel’s First Incompleteness Theorem (Claude mentions Von Neumann as an independent discoverer for the Second Incompleteness Theorem)
Feynman’s path integral formulation of quantum mechanics
Onnes’ discovery of superconductivity
Pauling’s discovery of the alpha helix structure in proteins
McClintock’s work on transposons
Observation of the cosmic microwave background
Lorentz’s work on deterministic chaos
Prusiner’s discovery of prions
Yamanaka factors for inducing pluripotency
Langmuir’s adsorption isotherm (I have no idea what this is)
[Question] Examples of Highly Counterfactual Discoveries?
I somehow missed that John Wentworth and David Lorell are also in the middle of a sequence on this same topic here.
Yeah, uh… hopefully nobody’s holding their breath waiting for the rest of that sequence. That was the original motivator, but we only wrote the one post and don’t have any more in development yet.
Point is: please do write a good stat mech sequence, David and I are not really “on that ball” at the moment.
(Didn’t read most of the dialogue, sorry if this was covered.)
But the way transformers work is they greedily think about the very next token, and predict that one, even if by conditioning on it you shot yourself in the foot for the task at hand.
That depends on how we sample from the LLM. If, at each “timestep”, we take the most-probable token, then yes that’s right.
But an LLM gives a distribution over tokens at each timestep, i.e. . If we sample from that distribution, rather than take the most-probable at each timestep, then that’s equivalent to sampling non-greedily from the learned distribution over text. It’s the chain rule:
Personally, I consider model-based RL to be not RL at all. I claim that either one needs to consider model-based RL to be not RL at all, or one needs to accept such a broad definition of RL that the term is basically-useless (which I think is what porby is saying in response to this comment, i.e. “the category of RL is broad enough that it belonging to it does not constrain expectation much in the relevant way”).