the asteroid would likely burn up, but perhaps you have a solution for that
Yes, there’s a well known solution: just make the asteroid fast enough, and it will burn less in the atmosphere.
the asteroid would likely burn up, but perhaps you have a solution for that
Yes, there’s a well known solution: just make the asteroid fast enough, and it will burn less in the atmosphere.
My understanding of this framework is probably too raw to go sane (A natural latent is a convolution basis useful for analyzing natural inputs, and it’s powerful because function composition is powerful) but it could fit nicely with Agency is what neurons in the biological movement area detect.
That’s great analogy. To me the strength of the OP is to pinpoint that LLMs already exhibit the kind of general ability we would expect from AGI, and the weakness is to forget that LLMs do not exhibit some specific ability most thought easy, such as the agency that even clownfishes exhibit.
In a way this sounds like again the universe is telling us we should rethink what intelligence is. Chess is hard and doing the dishes is easy? Nope. Language is hard and agency is central? Nope.
Thanks for the prompt! If we ask Claude 3 to be happy about x, don’t you think that counts as nudging it toward implementing a conscious being?
Perhaps you could identify your important beliefs
That part made me think. If I see bright minds falling in this trap, does blindness goes with importance of the belief for that person? I would say yes I think. As if that’s where we tend to make more « mistakes. that can behave as ratchets of the mind ». Thanks for the insight!
that also perhaps are controversial
Same exercise: if I see bright minds falling in this trap, does blindness goes with controversial beliefs? Definitely! Almost by definition actually.
each year write down the most likely story you can think of that would make it be wrong
I don’t feel I get this part as well as the formers. Suppose I hold the lab leak view, then notice it’s both controversial (« these morons can’t update right »), and much more important to me (« they don’t get how important it is for the safety of everyone »). What should I write?
Yup. Thanks for trying, but these beliefs seem to form a local minima, like a trap for the rational minds -even very bright ones. Do you think you understand how an aspiring rationalist could 1) recover and get out of this trap 2) don’t fall for it in the first place?
To be clear, my problem is not with the possibility of a lab leak itself, it’s with the evaluation that present evidences are anything but posthoc rationalizations fueled by unhealthy levels of tunnel vision. If bright minds can fall for that on this topic specifically, how do I know I’m not making the same mistake on something else?
(Spoiler warning)
(Also I didn’t check the previous survey nor the comments there, so expect some level of redondance)
The score itself (8/18) is not that informative, but checking the « accepted » answers is quite interesting. Here’s my « errors » and how much I’m happy making them:
You should be on the outlook for people who are getting bullied, and help defend them against the bullies.
I agree some rationalist leaders are toxic characters who will almost inevitably bully their students and collaborators, and I’m happy to keep strongly disagreeing with that. [actually most rationalists taking the survey agree with the statement, see tailcalled below]
Modern food is more healthy because we have better hygiene than people did in the past.
I strongly agree, LW accept weaker position. Ok, maybe I don’t have the data for a period of time in the past where food was not that bad. (Slight update on both the content and the need to countercheck the data myself).
It should be easy to own guns.
Seriously, is that even a question? Aren’t rationalists supposed to look at the data at some point?
Statistics show that black people still are far from catching up to white people in society.
I strongly agree, LW accept weaker position. But what I had in mind was the situation in Canada, so maybe I just don’t have the relevant data for the USA. Yes, that’s my kind of humour.
It is bad to buy and destroy expensive products in the name of “art”.
I thought I disagree, but ok I admit I could imagine good examples. Like a statue from melting guns involved in murdering children at school.
You should believe your friends if they tell you they’ve seen ghosts.
That’s an interesting one. I would not believe ghosts caused this perception, but I will not negate the perception itself, nor believe that establishing the truth about ghosts is what should occupy my mind in this situation.
Charity organizations should offer their employees dream vacations in the tropics to make the employment more attractive and enjoyable, thereby attracting more people to the charity.
I thought everyone got the memo it did hurt the whole movement. Am I wrong or the « correct » answer is just outdated from pre SBF times?
There is no God. Supernatural claims are never true
That’s the only two questions where I firsthand did expect to differ, for complicated reasons that I just find this week was better expressed by Aella. (In short, it’s more useful to keep a sane dose of agnosticism)
https://knowingless.com/2018/05/02/so-says-crazybrain/
(I should also credit Scott Aaronson for best showing QM allows supernatural claims such as « there’s free will »)
Curiosity is for boring nerds.
Meow.
I am an old person. They may not let you do that in chemistry any more.
Absolutely! In my first chemistry lab, a long time ago, our teacher warned us that she had just lost a colleague to cancer at the age of forty, and she swore that if we didn’t follow the security protocols very seriously, she would be our fucking nightmare.
I never heard her swear after that.
Not bad! But I stand by « random before (..) » as a better picture in the following sense: neurons don’t connect once to an address ending in 3. It connects several thousands of times to an address ending in 3. Some connexion are on the door, some on windows, some on the roof, one has been seen trying to connect to the dog, etc. Then it’s pruned, and the result looks not that far from a crystal. Or a convnet.
(there’s also long lasting silent synapses and a bit of neurogenesis, but that’s details for another time)
Hmmm, I disagree with the randomness.
I don’t think you do. Let me rephrase: the weights are picked at random, under a distribution biased by molecular cues, then pruned through activity dependent mechanisms.
In other words, our disagreement seems to count as an instance of Bertrand’s paradox.
The story went that “Perceptrons proved that the XOR problem is unsolvable by a single perceptron, a result that caused researchers to abandon neural networks”. (…) When I first heard the story, I immediately saw why XOR was unsolvable by one perceptron, then took a few minutes to design a two-layered perceptron network that solved the XOR problem. I then noted that the NAND problem is solvable by a single perceptron, after which I immediately knew that perceptron networks are universal since the NAND gate is.
Exactly the same experience and thoughts in my own freshyears (nineties), including the « but wasn’t that already known? » moment.
Rosenblatt’s solution was mainly just randomization because he mistakenly believed that the retina was randomly wired to the visual cortex, and he believed in emulating nature. Rosenblatt was working with the standard knowledge of neuroscience in his time. He could not have known that neural connections were anything but random – the first of the Hubel and Wiesel papers was published only in 1959.
Push back against this. Wiring is actually largely random before the critical periods that prunes most synapses, after which what remains is selected to fit the visual properties of the training environment. One way to mimicking that is to pick the delta weights at random and update iff the error diminishes (annealing).
But that’s one nitpick within many food for thought, thanks for the good reading!
He can be rough and on rare occasion has said things that could be considered personally disrespectful, but I didn’t think that people were that delicate.
You may wish to update on this. I’ve only exchange a few words with one of the name, but that was enough to make clear he doesn’t bother being respectful. That may work in some non delicate research environment I don’t want to know about, but most bright academic I know like to have fun at work, and would leave any non delicate work environment (unless they make their personal duty to clean the place).
What do you think orthogonality thesis is?
I think that’s the deformation of a fundamental theorem (« there exists an universal Turing machine, e.g. it can run any program ») into a practical belief (« an intelligence can pick its value at random »), with a motte and bailey game on the meaning of can where the motte is the fundamental theorem and the bailey is the orthogonal thesis.
(thanks for the link to your own take, e.g. you think it’s the bailey that is the deformation)
Consider the sense in which humans are not aligned with each other. We can’t formulate what “our goals” are. The question of what it even means to secure alignment is fraught with philosophical difficulties.
It’s part of the appeal, isn’t it?
If the oversight AI responsible for such decisions about a slightly stronger AI is not even existentially dangerous, it’s likely to do a bad job of solving this problem.
I don’t get the logic here. Typo?
So I’m not claiming sudden changes, only intractability of what we are trying to do
That’s a fair point, but the intractability of a problem usually goes with the tractability of a slightly relaxed problem. In other words, it can be both fundamentally impossible to please everyone and fundamentally easy to control paperclips maximizers.
And also an aligned AI doesn’t make the world safe until there is a new equilibrium of power, which is a point they don’t address, but is still a major source of existential risk. For example, imagine giving multiple literal humans the power of being superintelligent AIs, with no issues of misalignment between them and their power. This is not a safe world until it settles, at which point humanity might not be there anymore. This is something that should be planned in more detail than what we get by not considering it at all.
Well said.
All significant risks are anthropogenic.
You think all significant risks are known?
Also, it seems clear how to intentionally construct a paperclip maximizer: you search for actions whose expected futures have more paperclips, then perform those actions. So a paperclip maximizer is at least not logically incoherent.
Indeed the inconsistency appears only with superintelligent paperclip maximizers. I can be petty with my wife. I don’t expect a much better me would.
Existentially dangerous paperclip maximizers don’t misunderstand human goals.
Of course they do. If they didn’t and picked their goal at random, they wouldn’t make paperclips in the first place.
There’s this post from 2013 whose title became a standard refrain on this point
I wouldn’t say that’s the point I was making.
This has been hashed out more than a decade ago and no longer comes up as a point of discussion on what is reasonable to expect. Except in situations where someone new to the arguments imagines that people on LessWrong expect such unbalanced AIs that selectively and unfairly understand some things but not others.
That’s a good description of my current beliefs, thanks!
Would you bet that a significant proportion on LW expect strong AI to selectively and unfairly understand (and defend, and hide) their own goal while selectively and unfairly not understand (and not defend, and defeat) the goals of both the developers and any previous (and upcoming) versions?
If it doesn’t have a motive to do that,[ask the AI itself to monitor its well functioning, including alignement and non deceptiveness] it might do a bad job of doing that. Not because it doesn’t have the capability to do a better job, but because it lacks the motive to do a better job, not having alignment and non-deceptiveness as its goals.
You realize that this basically defeats the orthogonality thesis, right?
I agree it might do a bad job. I disagree an AI doing a bad job on this would be close to hide its intent.
One way AI alignment might go well or turn out to be easy is if humans can straightforwardly succeed in building AIs that do monitor such things competently, that will nudge AIs towards not having any critical alignment problems. It’s unclear if this is how things work, but they might. It’s still a bad idea to try with existentially dangerous AIs at the current level of understanding, because it also might fail, and then there are no second chances.
In my view that’s a very honorable point to make. However I don’t know how to ponder this with its mirror version: we might also not have a second chance to build an AI that will save us from x_risks. What’s your general method for this kind of puzzle?
Consider two AIs, an oversight AI and a new improved AI. If the oversight AI is already existentially dangerous, but we are still only starting work on aligning an AI, then we are already in trouble.
Can we more or less rule out this scenario based on the observation all main players nowadays work on aligning their AI?
If the oversight AI is not existentially dangerous, then it might indeed fail to understand human values or goals, or fail to notice that the new improved AI doesn’t care about them and is instead motivated by something else.
That’s completely alien to me. I can’t see how a numerical computer could hide its motivation without having been trained specifically for that. We the primates have been specifically trained to play deceptive/collaborative games. To think that a random pick of value would push an AI to adopt this kind of behavior sounds a lot like anthropomorphism. To add that it would do so suddenly, with no warning or sign in previous version and competitors, I have no good word for that. But I guess Pope & Belrose already made a better job explaining this.
Perhaps the position you disagree with is that a dangerous general AI will misunderstand human goals. That position seems rather silly, and I’m not aware of reasonable arguments for it. It’s clearly correct to disagree with it, you are making a valid observation in pointing this out.
Thanks! To be honest I was indeed surprised that was controversial.
But then who are the people that endorse this silly position and would benefit from noticing the error? Who are you disagreeing with, and what do you think they believe, such that you disagree with it?
Well, anyone who still believe in paperclip maximizers. Do you feel like it’s an unlikely belief among rationalists? What would be the best post on LW to debunk this notion?
The AI itself doesn’t fail, it pursues its own goals. Not pursuing human goals is not AI’s failure in achieving or understanding what it wants, because human goals is not what it wants. Its designers may have intended for human goals to be what it wants, but they failed. And then the AI doesn’t fail in pursuing its own goals that are different from human goals. The AI doesn’t fail in understanding what human goals are, it just doesn’t care to pursue them, because they are not its goals. That is the threat model, not AI failing to understand human goals.
That’s indeed better, but yes I also find this better scenario unsound. Why the designers wouldn’t ask the AI itself to monitor its well functioning, including alignement and non deceptiveness? Then either it fails by accident (and we’re back to the idiotic intelligence) or we need an extra assumption, like the AGI will tell us what problem is coming, then it will warn us what slightly inconvenient measures can prevent it, and then we still let it happen for petty political reasons. Oh well. I think I’ve just convinced myself doomers are right.
All that’s required is that we aren’t able to coordinate well enough as a species to actually stop it.
Indeed I would be much more optimistic if we were better at dealing with much simpler challenges, like put a price on pollution and welcome refugees with humanity.
Thanks 👍
(noice seems to mean « nice », I assume you meant « noise »)
Assuming “their” refers to the agent and not humans,
It refers to humans, but I agree it doesn’t change the disagreement, i.e. a super AI stupid enough to not see a potential misalignment coming is as problematic as the notion of a super AI incapable of understanding human goals.
(Epistemic fstatus: first thoughts after first reading)
Most is very standard cognitive neuroscience, although with more emphasis on some things (the subdivision of synaptic buttons into silent/modifiable/stable, notion of complex and simple cells in the visual system) than other (the critical periods, brain rhythms, iso/allo cortices, brain symetry and circuits, etc). There’s one bit or two wrong, but that’s nitpicks or my mistake.
The idea of synapses as detecting frequency code is not exactly novel (it is the usual working hypothesis for some synapses in the cerebellum, although the exact code is not known I think), but the idea that it’s a general principle that works because the synapse recognize it’s own noise is either novel or not well known even within cognitive science (it might be a common idea among specialists of synaptic transmission, or original). I feel it promising, like how Hebb has the idea of it’s law.
In this plan, how should the AI define what’s in the interest of the person being persuaded? For example, say you have a North Korean soldier who can be persuaded to quite for the west (at the risk of getting the shitty jobs most migrants have) or who can be persuaded to remain loyal to his bosses (at the risk of raising his children in the shitty country most north korean have), what set of rules would you suggest?