Interested in math, Game Theory, etc.
Pattern
Our world is so inadequate that seminal psychology experiments are described in mangled, misleading ways. Inadequacy abounds, and status only weakly tracks adequacy. Even if the high-status person belongs to your in-group. Even if all your smart friends are nodding along.
It says he started with the belief. Not, that he was right, or ended with it. Keeping the idea contained to the source, so it’s clear it’s not being stated could be improved, yes.
This is what would happen if you were magically given an extraordinarily powerful AI and then failed to aligned it,
Magically given a very powerful, unaligned, AI. (This ‘the utility function is in code, in one place, and can be changed’ assumption needs re-examination. Even if we assert it exists in there*, it might be hard to change in, say, a NN.)
* Maybe this is overgeneralizing from people, but what reason do we have to think an ‘AI’ will be really good at figuring out its utility function (so it can make changes without changing it, if it so desires). The postulate ‘it will be able to improve itself, so eventually it’ll be able to figure everything out (including how to do that)‘, seems to ignore things like ‘improvements might make it more complex and harder to do that while improving.’ Where and how do you distinguish between ‘this is my utility function’ and ‘this is a bias I have’? (How have you improved this, and your introspecting abilities? How would a NN do either of those?)
One important factor seems to be that Eliezer often imagines scenarios in which AI systems avoid making major technical contributions, or revealing the extent of their capabilities, because they are lying in wait to cause trouble later. But if we are constantly training AI systems to do things that look impressive, then SGD will be aggressively selecting against any AI systems who don’t do impressive-looking stuff. So by the time we have AI systems who can develop molecular nanotech, we will definitely have had systems that did something slightly-less-impressive-looking.
Now there’s an idea: due to competition, AIs do impressive things (which aren’t necessarily safe). An AI creates the last advance that when implemented causes a FOOM + bad stuff.
Eliezer appears to expect AI systems performing extremely fast recursive self-improvement before those systems are able to make superhuman contributions to other domains (including alignment research),
This doesn’t necessarily require the above to be right or wrong—human level contributions (which aren’t safe) could, worst case scenario...etc.
[6.] Many of the “pivotal acts”
(Added the 6 back in when it disappeared while copying and pasting it here.)
There’s a joke about a philosopher king somewhere in there. (Ah, if only we had, an AI powerful enough to save us from AI, but still controlled by...)
I think Eliezer is probably wrong about how useful AI systems will become, including for tasks like AI alignment, before it is catastrophically dangerous.
I think others (or maybe the OP previously?) have pointed out that AI can affect the world in big ways way before ‘taking it over’. Domain limited, or ‘sub-/on par with/super-’ ‘human performance’, doesn’t necessarily matter which of those it is (though more power → more effect is the expectation). Some domains are big.
Spoilering/hiding questions. Interesting.
Do the rules of the wizards’ duels change depending on the date?
I’ll aim to post the ruleset and results on July 18th (giving one week and both weekends for players). If you find yourself wanting extra time, comment below and I can push this deadline back.
The dataset might not have enough info for this/rules might not be deep enough, but a wizards duel between analysts, or ‘players’, also sounds like it could be fun.
I think that is a flaw of comments, relative to ‘google docs’. Long documents without the referenced areas being tagged in comments, might make it hard to find other people asking the same question you did, even if someone wondered about the same section. (And the difficulty of ascertaining that quickly seems unfortunate.)
It also possesses the ability to levitate and travel through solid objects.
How is it contained?
It’s still a trivial inconvenience sometimes, but:
Two tabs:
one for the response comment writing as reading
one for the reading
Note, sometimes people downvote typo comments. Doesn’t happen often, but, sometimes it seems like, when the author fixes it, it happens?
For example, if our function measures the probability that some particular glass is filled with water, the space near the maximum is full of worlds like “take over the galaxy and find the location least likely to be affected by astronomical phenomena, then build a megastructure around the glass designed to keep it full of water”.
If the function is ‘fill it and see it is filled forever’ then strange things may be required to accomplish that (to us) strange goal.
Idea:
Don’t specify our goals to AI using functions.
Flaw:
Current deep learning methods use functions to measure error, and AI learns by minimizing that error in an environment of training data. This has replaced the old paradigm of symbolic AI, which didn’t work very well. If progress continues in this direction, the first powerful AI will operate on the principles of deep learning.
Even if we build AI that doesn’t maximize a function, it won’t be competitive with AI that does, assuming present trends hold. Building weaker, safer AI doesn’t stop others from building stronger, less safe AI.
Do you have any idea how to do “Don’t specify our goals to AI using functions.”? How are you judging “if we build AI that doesn’t maximize a function, it won’t be competitive with AI that does”?
Idea:
Get multiple AIs to prevent each other from maximizing their goal functions.
Flaw:
The global maximum of any set of functions like this still doesn’t include human civilization. Either a single AI will win, or some subset will compete among themselves with just as little regard for preserving humanity as the single AI would have.
Maybe this list should be numbered.
This one is worse than it looks (though it seems underspecified). Goal 1: some notion of human flourishing. Goal 2: prevent goal 1 from being maximized. (If this is the opposite of 1, you may have just asked to be nuked.)
Idea:
Don’t build powerful AI.
Flaw:
For all the ‘a plan that handles filling a glass of water, generated using time t’ ‘is flawed’ - this could actually work. Now, one might object that a particular entity will try to create powerful AI. While there might be incentives to do so, trying to set limits, or see safeguard deployed (if the AI managing air conditioning isn’t part of your AGI research, add these safeguards now).
This isn’t meant as a pure ‘this will solve the problem’ approach, but that doesn’t mean it might not work (thus ensuring AIs handling cooling/whatever at data centers meet certain criteria).
Once it exists, powerful AI is likely to be much easier to generate or copy than historical examples of dangerous technologies like nuclear weapons.
There’s a number of assumptions here which may be correct, but are worth pointing out.
How big a file do you think an AI is?
1 MB?
1 TB?
That’s not to say that compression exists, but also, what hardware can run this program/software you are imagining (and how fast)?
undesirable worlds near the global maximum.
There’s a lot of stuff in here about maximums. It seems like your beliefs that ‘functions won’t do’ stems from a belief that maximization is occurring. Maximizing a function isn’t always easy, even at the level of ‘find the maximum of this function mathematically’. That’s not to say that what you’re saying is necessarily wrong, but suppose some goal is ‘find out how this protein folds’. It might be a solvable problem, but that doesn’t mean it is an easy problem. It also seems like, if the goal is to fill a glass with water, then the goal is achieved when the glass is filled with water.
Yeah. When something is very unclear, it’s like
Is it good or bad? It’s impossible to decipher, I can’t tell. Is it true or false? No way to tell. (It doesn’t happen often, but it’s usually downvoted.)
ETA: I’m not sure at the moment what other aspects there are.
It didn’t state that explicitly re sorting, but looking at:
It has no other direct effects on the site or content visibility.
I see what you mean. (This would have been less of a question in a ‘magic-less sorting system’.)
Agree/Disagree are weird when evaluating your comment.
Agree with you asking the question (it’s the right question to ask) or disagree with your view?
I read Duncan’s comment as requesting that the labeling of the buttons be more explicit in some way, though I wasn’t sure if it was your way. (Also Duncan disagreeing with what they reflect).
Upvote (Like**)
Quality*
Agreement (Truth)
Veracity
Not present***: Value?Judgement? (Good/bad)
Good/Bad
**This is in ()s because it’s the word that shows up in bold when hovering over a button.
*How well something is written?
***That is a harsher bold than I was going for.
I think some aspects of ‘voting’ might benefit from being public. ‘Novelty’ is one of them. (My first thought when you said ‘can’t be downvoted’ was ‘why?’. My filtering desires for this might be...complex. The simple feature being:
I want to be able to sort by novelty. (But also be able to toggle ‘remove things I’ve read from the list’. A toggle, because I might want it to be convenient to revisit (some) ‘novel’ ideas.))
you should also have to be thinking about
Consider replacing this long phrase (above) with ‘consider’.
Upvoting/downvoting self
Sorting importance
‘Agreeing’/‘Disagreeing’
‘I have discovered that this (post (of mine)) is wrong in important ways’
or
Looking back, this has still stood the test of time.
These methods aren’t necessarily very effective (here).
Arguably, this can be done better by:
Having them be public (likely in text). What you think of your work is also important. (‘This is wrong. I’m leaving it up, but also see this post explaining where I went wrong, etc.’)
See the top of this article for an example: https://www.gwern.net/Fake-Journal-Club
certainty: log importance: 4
How do sorting algorithms (for comments) work now?
For companies, this is something like the R&D budget. I have heard that construction companies have very little or no R&D. This suggests that construction is a “background assumption” of our society.
Or that research is happening elsewhere. Our society might not give it as much focus as it could though.
In the context of quantilization, we apply limited steam to projects to protect ourselves from Goodhart. “Full steam” is classically rational, but we do not always want that. We might even conjecture that we never want that.
So you never do anything with your full strength, because getting results is bad?
Well, by ‘we’ you mean both ‘you’ and ‘a thing you are designing with quantilization’.
It seems to me that in a competitive, 2-player, minimize-resource-competition StarCraft, you would want to go kill your opponent so that they could no longer interfere with your resource loss?
I would say that in general it’s more about what your opponent is doing. If you are trying to lose resources and the other player is trying to lose them, you’re going to get along fine. (This would be likely be very stable and common if players can kill units and scavenge them for parts.) If both of you are trying to lose them...
Trying to minimize resources is a weird objective for StarCraft. As is gain resources. Normally it’s a means to an end—destroying the other player first. Now, if both sides start out with a lot of resources and the goal is to hit zero first...how do you interfere with resource loss? If you destroy the other player don’t their resources go to zero? Easy to construct, by far, is ‘losing StarCraft’. And I’m not sure how you’d force a win.
This starts to get into ‘is this true for Minecraft’ and...it doesn’t seem like there’s conflict of the ‘what if they destroy me, so I should destroy them from’ kind, so much as ‘hey stop stealing my stuff!’. Also, death isn’t permanent, so… There’s not a lot of non-lethal options. If a world is finite (and there’s enough time) eventually, yeah, there could be conflict.
More generally, I think competitions to minimize resources might still usually involve some sort of power-seeking.
In the real world maybe I’d be concerned with self nuking. Also starting a fight, and stuff like that—to ensure destruction—could work very well.
stab
You assume they’re dead. (It gives you a past measurement—no guarantee someone won’t become evil later.)
I was thinking
The rules don’t change over time, but what if on...the equivalent of the summer solstice, fire spells get +1 fire mana or something. i.e, periodic behavior. Wait, I misread that. I meant more like, rules might be different, say, once every hundred years (anniversary of something important) - like there’s more duels that day, so you might have to fight multiple opponents, or something.
This is a place where people might look at the game flux, and go ‘the rules don’t change’.