This seems like a somewhat trivial construction that relies on the fact that S_1 and S_2 are both small compared to the data. This, to me, seems like saying “If you can drive from San Francisco to Seattle using a Tesla car key (by driving a Tesla), and also do it in a Honda car key (by getting in a Hyundai), then you can drive there almost as fast (modulo the mass of the second key slowing you down) by using both keys and driving there in the Tesla.” Am I missing something?
J Bostock
Seem like human brain is missing a regularization loss term between its Q network and V network.
My guess would be that human mind sub-modules have commandeered the predictive coding “handshake” procedure which signals “close enough, basically no discrepancies here”. A planning submodule gets given a subgoal from the higher up module, and works until it hits “close enough”. Possibly some of the “close enough” can come from treating ```await subsubmodule(subsubgoal)``` macros as having provisionally returned ```close_enough```, which then gets updated to something else once the subsubmodule returns.
Unfortunately, I also don’t know how to construct this kind of thing!
Perhaps the model is probably updating its prior on “I am in an alignment eval” relative to to “I am in a ridiculous roleplay scenario”
There’s two questions here, then: is reading good for you in general, and are the positive effects attenuated if the motivation is wrong. I think the answer to the second one is very likely “no” as long as you are, in fact, actually reading to a similar depth (compare: if you’re unmotivated to run so you half-ass it, you won’t get the same benefits). I wasn’t aware you were actually questioning the first one, and there isn’t much hard RCT evidence so if your priors are that reading isn’t very useful then, uh, don’t bother I guess.
I don’t really think there are right or wrong reasons to read books, just like there aren’t right or wrong reasons to exercise. The benefits will accrue either way. Consider book clubs as analogous to running clubs in producing social pressure to keep reading.
The strongest in favour deontology/virtue ethics are about morality as distinct from axiology, which involve practical questions about how computationally bounded agents should behave in a universe which resembles our own and contains other agents. I think thought experiments like this mostly fail to engage with the actual reasons one might expose deontology or virtue ethics as a way that an agent should operate.
The problem with these thought experiments is, once you get beyond the most basic of trolley problems, is that they don’t ever actually happen in the real world! You essentially never, as an agent, get both total knowledge over the exact scenario some number of people are in, except for some specific, bounded uncertainty over some parameters, with no way of communicating with anyone inside the settings, or any kind of longer-running tradition or doctrine or set of expectations from other agents as to what you “should” do.
I think Scott Alexander’s Seagull Principle applies here. You can give me as many trolley problem suitcase swapping problems with deontology as you like, and I’m still going to go back to following rules like “Don’t lie” and “Don’t give up people’s secrets” rather than utilitarianly calculating the expected value of every potential lie, safe in the knowledge that I will never be forced to decide whether to swap a person in a suitcase with a suitcase full of sand and then pull a lever to divert a trolley half an hour later.
My guess as to your method was “people pick the King of Hearts around 50% of the time”. Touché!
There is another lens through which to look at this situation: the current coding behavior of LLMs (generate like 20k words of thinking, write code, loop) is so off-distribution compared to the training data that the abstraction of “persona” breaks down completely. Telling the LLM (through RL) to “simulate a thing generating 10k words of thinking text as part of a text-only process of code writing” tells it, in no uncertain terms, that it is not simulating a particular friendly techy hippie-ish assistant kind of person, or any other kind of person, because no human in its training data regularly produces this kind of text.
I gained a similar phase-shift insight into the nature of words after writing around 40,000-50,000 words of text in the space of about five weeks. Sadly, a large portion of the insight is about the inadequacy of words as a means of communication, so I cannot very well express it!
I would greatly appreciate this (and I expect others would too) at least for your favourites. I won’t remember to check every single lab’s alignment blog. Also, the OpenAI alignment blog has nowhere for feedback or discussion.
Story from my past: at university, I once partook of a game called the “assassins’ guild”. It was a kind of Battle Royale. Fifty-odd participants would each be circularly assigned two “targets” from the other forty-nine, and instructed to “kill” them (for example, by writing “knife” on a stick and poking them with it, or by shooting them with a nerf gun). You’d be told their halls of residence, so you could find them there, if needed.
Your targets were revealed at 09:00 on the first day. I found my target’s Facebook page, found a post announcing her going to uni, and saw she was studying a subject which shared a module with my subject. From there I was able to pull up the timetable of her subject, guess which lectures were mandatory, and notice that she had a mandatory one right now. I “killed” her at 10:00 as she left the lecture hall.
I didn’t even get the first kill! Someone else pulled off the same trick even quicker than me!
This is with a smart uni student’s level of skill: the only thing it took was effort. If AI is good enough to do this, then privacy removal will be very easy to perform at scale.
That’s one way of putting it, yeah: the band who want to explore a new sound, the hunter who gets sick of eating deer every day, and the LLM with an entropy term in its reward function are all of the same ilk.
Can you say more about why you think this?
You Are Not Immune To Mode Collapse
Baba is you is one I’ve written about.
I had a conversation about this recently, and I raised the point that fieldbuilding suffers from the same issues. When it really comes down to the wire, someone has to carry out the hard task of (being capable of) understanding and verifying the entire alignment stack end-to-end.
This puts governments, AI company CEOs, and Dustin Moskowitz in an awkward position: the person to understand the stack might not be able to make it legible to them (compare to the US Army guys who had to take the atmospheric ignition calculations on the word of the physicists).
Maaaybe you can break it down and entrust the local validity of each step to a different genius: “Terry Tao says the maths checks out, and our compiler engineers all agree on the architecture” or something like that, but I wouldn’t bet the world on it.
This is probably net good, but without any organizational changes I don’t think it means much at all. Sam can and does just make a promise, and then “change his mind” as soon as it’s convenient. There’s an angle where this comment pushes OpenAI in a particular direction (and OpenAI has more inertia than Sam alone) and another angle where it makes him look bad when he goes back on this, but I don’t think you can or should read much intent into his words at all other than “I think the next interaction I have will go well if I say that I’m going to cooperate with democracies”.
Legitimacy-importance arbitrage in academia might be an issue.
The standard story for how academia became Like That is something like this.
Grantmakers give out money based on citations
Therefore academics goodhart on citations
Therefore they all pressure each other to cite irrelevant pieces of work during e.g. peer review
Steps 1 and 2 are mostly right, but step 3 isn’t quite right. A lot of the time, academics cite irrelevant work to make their own publications look more important. If your work is in some obscure corner of chlorate chemistry (sorry to chlorate chemists) then you can make it look better by citing some other piece of work which is only tangentially related to yours (say, how a brominated compound has anti-cancer properties (in mice)) then your paper looks better, even if that work is dubious. If your chlorate paper is entirely legitimate but also not that relevant, you’re lending credibility to the brominated mouse-cancer guys. There’s an arbitrage opportunity where one group makes dubious but cool claims, and another makes solid but boring claims, and the two of them cite each other to give a false impression that the field is both cool and solid.
I was thinking of “small” in terms of K-complexity, but I had still misunderstood it. Is the rough intuition behind this result the following:
Suppose has two components, which generates , and which generates the data given . We can “combine” and to form a program which first uses to generate and then uses to generate given . Then either and have some shared structure, in which case so is shorter than or and have no shared structure, in which so is shorter either than or . Or some combination of the two.