Actually, I think AGI would be amazed that we have managed to create something as amazing as it in spite of having the primitive brains wired specifically for the mammoth hunter lifestyle.
baturinsky
Problem is, our alignment is glitchy too. We are wired to keep running for the carrot that we will never be able to have for long. Because we will also strive for me. But AI can just teleport us to the “maximum carrot” point. Meanwhile, what we really need is not the destination, but the journey. At least, that’s what I believe into. Sadly, not much people understand/agree with it.
Helping solving health problems and prolonging life I can accept from AI, if we can’t solve it by ourselves.
But what if AI will decide that, say, being constantly maximally happy is nice, so it turns everyone into happy vegatables?
GPT4 fails even in extremely simple games.
”There are four pebbles on the 2 x 2 grid. Each turn, one player can remove one pebble, or two nearby pebbles. If these were the last pebbles, he wins. If two players play, who has the winning strategy?”
🤖”In this game, the first player has a winning strategy. The strategy is to always leave an odd number of pebbles for the opponent to take on their turn. This can be achieved by removing one pebble on the first turn, and then mirroring the opponent’s moves throughout the game. Eventually, the first player will be left with one pebble, which they can remove to win the game. “
I guess it just have failed to understand the rules.
Alpha Go is superhuman in Go an Go only. It will also be possible to make an AI that is very good in math, but has no idea about the real world.
Yeah, it could be fun, but it could feel empty. As you can’t actively work to make world better, or other people happier. With exception maybe if it is also some cooperative game/sport. Or if other people’s needs require specifically people’s assistance. Maybe they need specifically human-provided hug:) I think I would.
Human brain was made for task solving. Stuff like pain, joy, grief or happiness are only exist to help brain realise which task to solve.
If brain is not solving tasks, it will either invent tasks for itself (that’s what near 100% of entertainment revolves around), or will suffer from frustration and boredom. Or both.
So, completely taking away the need to think from someone is not altruism. It’s one of the most cruel thing one could enact on a person, comparable with physically killing or torturing them.
Exact copy of the N8 here https://www.asciiart.eu/animals/dogs
Even completely dumb viruses and memes have managed to propagate far. NAI could probably combine doing stuff itself and tricking/bribing/scaring people nto assist it. I suspect some crafty fellow could pull it even now, finetuning some “democratic” LLM model.
“Do minimal work”, “Do minimal harm”, “Use minimum resources” are goals that do not converge to power seeking. And are convergent goals by themselves too.
I think it will happen before the full AGI. It will be the narrow AI very capable in coding, speech and image/video generation, but unable to do, say, complete biological research or do advanced robotic tasks.
He refers to the test questions about the third words and letter, etc. I think in that case errors stem from the GPT4 ’s weakness with low-level properties of character strings, not from it’s weakness with numbers.
If you ask it about “What is the third digit of the third three-digit prime?” it will answer correctly (ChatGPT won’t).
What if the goal was “do not prepare”?
Can this be used as some kind of lie detector?
GPT 3.5/4 is usually capable of reasoning correctly where humans can see the answer at a glance.
It is also correct when the correct answer requires some thinking, as long as the direction of the thinking is described somewhere in the data set. In such cases, the algorithm “thinks out loud” in the output. However, it may fail if it is not allowed to do so and is instructed to produce an immediate answer.
Additionally, it may fail if the solution involves initial thinking, followed by the realization that the most obvious path was incorrect, requiring a reevaluation of part or all of the thinking process.
“Homepage” button links to github, github readme links to tar with tests. Yeah, it’s kinda not evident in some cases.
Not for the all of them, but for the many of them you can see data and other info around here : https://paperswithcode.com/dataset/mmlu
Looks like estimating the architecture of the future AGI is considered the “infohazard” too. While knowing it could be very useful to figure out which way we will have to align them.
Answer to question 4 is that they were trying to define the Luigi behavior by explicitly describing Waluigi and saying to be the opposite. Which does not work well even with humans. (When someone is told not to think about something, it can actually increase the likelihood that they will think about it. This is because the brain has difficulty processing negative statements without also activating the associated concept. In the case of the “white monkey” example, the more someone tries to suppress thoughts of a white monkey, the more frequently and intensely they may actually think about it—from GPT’s explanation of white monkey aka pink elephant phenomenon)
I think referring Waluigi can only work well if it is used only in the separate independent “censor” AI, which does not output anything by itself and only suppresses certain behaviour of the primary one. As the Bing and character.ai already do, it seems.
Everybody have their own criteria of truth.
So, there should be a wide choice of the algorithms and algorithm tweaks that would analyze the relevant data, filter it and process it in a specific way that would satisfy the specific needs of the specific person.