This post is pointing at a good tool for identifying bias and motivated reasoning, but l don’t think that the use of “reversal test”, here aligns with how the term was coined in the original Bostrom / Ord paper (https://nickbostrom.com/ethics/statusquo.pdf). That use of the term makes the point that if you oppose some upward change in a scaler value, and you have no reason to think that that value is already precisely optimized, then you should want to change that value in the opposite direction.
In this case it seems fine to add the image, but I feel disconcerted that mods have the ability to edit my posts.
I guess it makes sense that the LessWrong team would have the technical ability to do that. But editing a users post, without their specifically asking, feels like a pretty big breach of… not exactly trust, but something like that. It means I don’t have fundamental control over what is written under my name.
That is to say, I personally request that you never edit my posts, without asking (which you did, in this case) and waiting for my response. I furthermore, I think that should be a universal policy on LessWrong, though maybe this is just an idiosyncratic neurosis of mine.
I knew that I could, and didn’t, because it didn’t seem worth it. (Thinking that I still have to upload it to a third party photo repository and link to it. It’s easier than that now?)
CI uses a less advanced (and cheaper) cryoprotectant but cryoprotects ONLY THE HEAD, allowing the rest of the body to be straight frozen with massive damage. That’s especially odd since (many of) CI members are insistent about being whole body patients rather than neuros.
I did not know this. Thanks.
Here is a simple 1.5 question survey.
What’s the difference?
Suppose I’m talking with a group of loose acquaintances, and one of them says (in full seriousness), “I’m not homophobic. It’s not that I’m afraid of gays, I just think that they shouldn’t exist.”
It seem to me that it is appropriate for me to say, “Hey man, that’s not ok to say.” It might be that a number of other people in the conversation would back me up (or it might be that they they defend the first guy), but there wasn’t common knowledge of that fact beforehand.
In some sense, this is a bid to establish a new norm, by pushing a the private opinions of a number of people into common knowledge. It also seems to me to be a virtuous thing to do in make situations.
(Noting that my response to the guy is not: “Hey, you can’t do that, because I get to decide what people do around here.” It’s “You can’t do that, because it’s bad” and depending on the group to respond to that claim in one way or another.)
New (image) post: My strategic picture of the work that needs to be done
In which case, TraderJoe and Rixie, good job at being appropriately confused!
I very much agree.
This seems that it might be testable. If you force impulsive folk to wait and think, do they generate more ideas for how to proceed?
I belive there are 4 members of Leverage (including Geoff), and something like 7 members of Paradigm (including Geoff). Paradigm and Leverage are somewhat more distinct now, than they were over previous years, but are both still headed by Geoff, unlike the other groups, which are meaningfully spun-off.
I think that’s what the book referenced here, is about.
New post: Capability testing as a pseudo fire alarm
[epistemic status: a thought I had]
It seems like it would be useful to have very fine-grained measures of how smart / capable a general reasoner is, because this would allow an AGI project to carefully avoid creating a system smart enough to pose an existential risk.
I’m imagining slowly feeding a system more training data (or, alternatively, iteratively training a system with slightly more compute), and regularly checking its capability. When the system reaches “chimpanzee level” (whatever that means), you stop training it (or giving it more compute resources).
This might even be a kind of fire-alarm. If you have a known predetermined battery of tests, then when some lab develops a system that scores “at the chimp level” at that battery, that might be a signal to everyone, that it’s time to pool our resources and figure out safety. (Of course, this event might alternatively precipitate a race, as everyone tries to get to human-level first.)
Probably the best way to do this would be for both training data, and compute / architecture. Start with a given architecture, then train it, slowly increasing the amount or quality of the training data, with regular tests (done on “spurs”, the agent should never have episodic memory of the . When increasing training data plateaus, iteratively improve the architecture in some way, either by giving the systems it more compute resources, or maybe making small adjustment. Again train the new version of the system, with regular tests. If you ever start to get very steep improvement, slow down and run tests more frequently.
Naively, it seems like a setup like this would prevent an AI team from overshooting and making a system that is much more capable that they think (which gives rise to all kinds of problems, like treacherous turns), regardless of how close “chimp” is to “human” on some absolute intelligence scale.
This of course, depends on having metrics that work. Of course, it seems very likely that early general AIs will be minds with a very unusual “shape”, such that it has sophisticated reasoning abilities along dimensions that are not natural categories to humans, while being weak on many of the dimensions we’re testing for.
Although, this seems a bit surprising, since presumably we would be training it to hit those metrics? Somehow it would get better, faster, at a kind of reasoning we were not intending to train, than at the kind of reasoning that we were optimizing for?
Also, can we just design metrics that get around this problem, by measuring exactly what we care about, i.e. danger? Something like a metric for consequentialist reasoning?
This idea depends on the system not deceiving you, or intentionally underperforming on your metrics.
Naively, this doesn’t seem like that much of an issue, if you are doing tests regularly enough. I’m presuming (perhaps wrongly), that a chimpanzee is not smart enough to infer that it is an agent in a simulation that is overseen by human agents, and that it should try to decide those humans. That is, we stop our progression before the agent is at the point where deception is a consideration.
Which means we really need to be confident about how smart something needs to be before it tries to deceive us?
Really? Its first attempts at tricking humans will be pretty transparent. Just as the attempts of animals / babies to trick us are pretty transparent.
At least one “danger point”, is when the system is capable enough to realize the instrumental value of self improving by seizing more resources.
How smart is this?
My guess, is really smart. Animals come pre-loaded with all kinds of instincts that cause them to seek out food, water, etc. These AI systems would not have an instinct to seek more training data / computation. Most humans don’t reason their way into finding ways to improve their own reasoning. If there was a chimp, even loose in the internet (whatever that means), would it figure out to make itself smarter?
If the agent has experienced (and has memories of) rounds of getting smarter, as the humans give it more resources, and can identify that these improvements allow it to get more of what it wants, it might instrumentally reason that it should figure out how to get more compute / training data. But it seems easy to have a setup such that no system has episodic memories previous improvement rounds.
[Note: This makes a lot less sense for an agent of the active inference paradigm]
Could I salvage it somehow? Maybe by making some kind of principled distinction between learning in the sense of “getting better at reasoning” (procedural), and learning in the sense of “acquiring information about the environment” (episodic).
[Real short post. Random. Complete speculation.]
Childhood lead exposure reduces one’s IQ, and also causes one to be more impulsive and aggressive.
I always assumed that the impulsiveness was due, basically, to your executive function machinery working less well. So you have less self control.
But maybe the reason for the IQ-impulsiveness connection, is that if you have a lower IQ, all of your subagents/ subprocesses are less smart. Because they’re worse at planning and modeling the world, the only way they know how to get their needs met are very direct, very simple, action-plans/ strategies. It’s not so much that you’re better at controlling your anger, as the part of you that would be angry is less so, because it has other ways of getting its needs met.
This paper seems at least a little relevant.
Abstract: The brain’s reliance on glucose as a primary fuel source is well established, but psychological models of cognitive processing that take energy supply into account remain uncommon. One exception is research on self-control depletion, where debate continues over a limited-resource model. This model argues that a transient reduction in self-control after the exertion of prior self-control is caused by the depletion of brain glucose, and that self-control processes are special, perhaps unique, in this regard. This model has been argued to be physiologically implausible in several recent reviews. This paper attempts to correct some inaccuracies that have occurred during debate over the physiological plausibility of this model. We contend that not only is such limitation of cognition by constraints on glucose supply plausible, it is well established in the neuroscience literature across several cognitive domains. Conversely, we argue that there is no evidence that self-control is special in regard to its metabolic cost. Mental processes require physical energy, and the body is limited in its ability to supply the brain with sufficient energy to fuel mental processes. This article reviews current findings in brain metabolism and seeks to resolve the current conflict in the field regarding the physiological plausibility of the self-control glucose-depletion hypothesis.
I’ve been told, by people much smarter than me, and more connected to even smarter people, that the very elite, in terms of IQ have a sense of learned helplessness about the world.
According to this story, the smartest people in the world look around, and see stupidity all around them: the world is populated by, controlled by, such people who regularly make senseless decisions, and can’t even tell that they’re senseless. And it is obvious that trying to get people to understand is hopeless: aside from the fact that most of them basically can’t understand, you are small, and the world is huge.
So these people go and do math, and make a good life for themselves, and don’t worry about the world.
[I don’t know if this story is true.]
Yeah. I think you’re on to something here. My current read is that “mental energy” is at least 3 things.
Can you elaborate on the what “knowledge saturation” feels like for you?