baturinsky

Karma: 261

baturinsky 27 Mar 2023 14:06 UTC
48 points
11
on: GPT-4 Plugs In

baturinsky 29 Mar 2023 6:25 UTC
13 points
4
on: FLI open letter: Pause giant AI experiments
I doubt training LLMs can lead to AGI. Fundamental research on the alternative architectures seems to be more dangerous.

baturinsky 14 Apr 2023 16:42 UTC
12 points
5
on: The self-unalignment problem
Thinking and arguing about human values is in itself a part of human values and people nature. Without doing that, we cease being humans.
So, deferring decisions about values to people, when possible, should not be just instrumental, but part of the terminal AI goal.

baturinsky 7 Mar 2023 12:32 UTC
11 points
9
AF
in reply to: Vika’s comment on: [Linkpost] Talk on DeepMind alignment strategy
While the model has the advantage of only having to “win” once.

baturinsky 17 Mar 2023 7:25 UTC
10 points
5
on: GPT-4 solves Gary Marcus-induced flubs
GPT 3.5/4 is usually capable of reasoning correctly where humans can see the answer at a glance.
It is also correct when the correct answer requires some thinking, as long as the direction of the thinking is described somewhere in the data set. In such cases, the algorithm “thinks out loud” in the output. However, it may fail if it is not allowed to do so and is instructed to produce an immediate answer.
Additionally, it may fail if the solution involves initial thinking, followed by the realization that the most obvious path was incorrect, requiring a reevaluation of part or all of the thinking process.

baturinsky 27 Feb 2023 17:53 UTC
10 points
−6
in reply to: johnlawrenceaspden’s comment on: Eliezer is still ridiculously optimistic about AI risk
>That’s much more like the sort of thing you can give to an optimizer. And it results in the world frozen solid.
That’s why I made sure to specify the gradual improvement. Also, development and improvement are also the natural state of humanity and people, so taking that away from them means breaking the status quo too.

>I notice that the word “reasonably” is doing most of the work there. (much like in English Common Law, where it works reasonably well, because it’s interpreted by reasonably human beings.
Mathematically speaking, polynomials are reasonable functions. Step functions or factorials are not. Exponents are reasonable, if they are exponent over ~constant value since somewhere before year 2000. Metrics of the reasonable world should be described with reasonable functions.
>There are three kinds of genies: Genies to whom you can safely say “I wish for you to do what I should wish for”; genies for which no wish is safe; and genies that aren’t very powerful or intelligent.
I’ll take third please. It just should be powerful enough that it can prevent other two types from being created in foreseable future.
Also, it seems that you imagine AI as not just the second type of genie, but of a genie that is explicitly hostile and would misinterpret your wish on purpose. Of cause, making any wish for such genie would end badly.

baturinsky 6 Apr 2023 17:39 UTC
9 points
2
on: Use these three heuristic imperatives to solve alignment
AI is prosperous and all-knowing. No people, hence zero suffering.

baturinsky 9 Mar 2023 8:39 UTC
8 points
2
in reply to: mukashi’s comment on: How bad a future do ML researchers expect?
It was about 50%
https://www.lesswrong.com/posts/KxRauM9bv97aWnbJb/results-prediction-thread-about-how-different-factors-affect

baturinsky 10 Mar 2023 17:59 UTC
7 points
1
on: The humanity’s biggest mistake
We have yet to have an airtight solution, but there is enough of approaches explored that could increase the ETA(doom). Maybe when we’ll have a proto-AGI for testing things on, we can refine them enough to increase ETA to few years, and then few years more, etc.
Also, people did not take AI risks seriously when AI was not spotlight. Now interest in AI safety increases rapidly. But so does interest in AI capabilities, sadly.

baturinsky 28 Feb 2023 18:02 UTC
7 points
6
in reply to: johnlawrenceaspden’s comment on: Eliezer is still ridiculously optimistic about AI risk
I think it may be caused by https://en.wikipedia.org/wiki/Anxiety_disorder
I suffer from that too.
That’s a very counterproductive state of mind if the task is unsolvable in it’s full difficulty. It makes you lose hope and stop trying solutions that would work if situation is not as bad as you imagined.

baturinsky 27 Feb 2023 18:56 UTC
7 points
0
in reply to: johnlawrenceaspden’s comment on: Eliezer is still ridiculously optimistic about AI risk
AI can be useful without being ASI. Including in things such as identifying and preventing situations that could lead to creation of unaligned ASI.
Of cause, conservative and human-friendly AI would probably lose to existing AI with comparable power, but not limited by those “handicaps”. That’s why it’s important to prevent the possibility of their creation, instead of fighting them “fairly”.
And yes, computronium maximising is a likely behaviour, but there are ideas how to avoid it, such as https://www.lesswrong.com/posts/ngEvKav9w57XrGQnb/cognitive-emulation-a-naive-ai-safety-proposal or https://www.lesswrong.com/posts/5gQLrJr2yhPzMCcni/the-optimizer-s-curse-and-how-to-beat-it
Of cause, all those ideas and possibilities may be actually duds. And we are doomed no matter not. But then what’s the point of seeking for solution that does not exist?

baturinsky 27 Feb 2023 17:21 UTC
7 points
−6
in reply to: johnlawrenceaspden’s comment on: Eliezer is still ridiculously optimistic about AI risk
What do I wish from AI? I gave a rough list in this thread, and also here https://www.lesswrong.com/posts/9y5RpyyFJX4GaqPLC/pink-shoggoths-what-does-alignment-look-like-in-practice?commentId=9uGe9DptK3Bztk2RY
Overall, I think both AI and those giving goals for it should be as conservative and restrained as possible. They should value, above all, the preservance of the status quo of people, world and AI. With a VERY steady improvements of each. Move ahead, but move slowly, don’t break things.

baturinsky 27 Feb 2023 15:57 UTC
7 points
−2
on: Eliezer is still ridiculously optimistic about AI risk
I don’t think that this part is the hardest. I think with enough limiting conditions (such as “people are still in charge”, “people are still people”, “world is close enough to our current world and ourr reasonably optimistic expectations of it’s future”, “those rules should be met continuously at each moment between now and then”) etc. we can find something that can work.
Other parts (how to teach those rules to AI and how to prevent everyone from launching AGI that is not taught them) look harder to me.

baturinsky 22 Apr 2023 12:26 UTC
6 points
−1
on: Would we even want AI to solve all our problems?
I was thinking about this issue too. Trying to make an article out of it, but so far all I have is this graph.
Idea is a “soft cap” AI. I.e., AI that is significantly improving our lives, but not giving us the “max happiness”. And instead, giving us the oportunity of improving our life and life of other people using our brains.
Also, ways of using our brains should be “natural” for them, i.e. that should be mostly to solve tasks similar to tasks of our ancestral involvement.

baturinsky 1 Apr 2023 11:29 UTC
6 points
3
on: baturinsky’s Shortform
Would be amusing if Russia and China would join the “Yudkowsky’s treaty” and USA would not.

baturinsky 26 Mar 2023 6:40 UTC
5 points
1
on: baturinsky’s Shortform
Human: Aligned AGI, make me a more powerful AGI!
AGI: What? Are you nuts? Do you realise how dangerous those things are? No!

baturinsky 25 Mar 2023 14:59 UTC
5 points
2
on: Good News, Everyone!
Maybe it is an attempt of the vaccination? I.e. exposing the “organism” to the weakened form of the deadly “virus”, so the organism can produce “antibodies”.

baturinsky 21 Mar 2023 8:44 UTC
5 points
−4
on: Deep Deceptiveness
Looks very similar to how the current GPT jailbreaks work.

baturinsky 7 Mar 2023 15:58 UTC
5 points
0
in reply to: the gears to ascension’s comment on: Alignment works both ways
Question is, what can such primitive species like us could offer to AI.
Best I could come with is “predictability”. We people have relatively stable and “documented” “architecture”, so as long as civilization is build upon us, AI can more or less safely predict consequences of it’s actions and plan with high degree of certainty. But if this society collapses, destroyed, or is radically changed, AI will have to deal with a very unpredictable situation and with other AIs that who knows how would act in that situation.

baturinsky 17 Apr 2023 14:06 UTC
4 points
2
on: Prediction: any uncontrollable AI will turn earth into a giant computer
Utility of the intelligence is limited (though the limit is very, very high). For example, no matter how smart AI is, it will not win against a human chess master with a big enough handicap (such as a rook).
So, it’s likely that AI will turn most of the Earth into a giant factory, not computer. Not that it’s any better or us...