quetzal_rainbow

Karma: 946

quetzal_rainbow 22 Oct 2022 7:41 UTC
7 points
3
in reply to: unparadoxed’s comment on: Moorean Statements
It’s totally valid statement, “I lose some value here in one way, but gain some value in another, and resulting sum is positive.”

quetzal_rainbow 24 Oct 2022 11:10 UTC
1 point
0
on: Interpreting Neural Networks through the Polytope Lens
Question from a total layman: have you tried to change activations of some neurons to prevent them from reaching certain region of activation space and see what happens? I don’t fully understand the underlying math but it seems to me like it should “censor” some possible outputs of NN.

quetzal_rainbow 24 Oct 2022 13:43 UTC
1 point
0
on: AI researchers announce NeuroAI agenda
Yes, CNNs were brain-inspired, but attention in general and transformers in particular don’t seem to be like this.

quetzal_rainbow 9 Nov 2022 10:29 UTC
1 point
0
in reply to: Eric Drexler’s comment on: Applying superintelligence without collusion
Does it make sense to talk about “(non)cooperating simulators”? Expected failure mode for simulators are more like exfo- and infohazards, like output to the query “print code for CEV-Sovereign” or “predict the future 10 years of my life”.

quetzal_rainbow 10 Nov 2022 6:09 UTC
1 point
−2
on: People care about each other even though they have imperfect motivational pointers?
The problem is that “Goodhart’s curse” is an informal statement. It doesn’t say literally “for all |u—v| > 0 optimation for v leads to oblivion of u”. When we talk about “small differences”, we talk about “differences in the space of all possible minds”, where difference between two humans is practically nonexistent. If you, say, find subset of utility functions V, such that |v—u| < 10^(-32) utilon for all v in V, where u—humanity utility function, you should implement it right now in Sovereign, because, yes, we lose some utility, but we have time limit for solving alignment. The problem of alignment is that we can’t specify V with such characteristics. We can’t specify V even such that corr(u, v) > 0.5.

quetzal_rainbow 11 Nov 2022 18:09 UTC
6 points
0
on: Open & Welcome Thread—November 2022
Hello everyone. I was in read-only mode on LW since, i don’t know, 2013, learned about LW from HPMOR, decided to create an account recently. Primarily, I’m interested in discussing AI alignment research and breaking odd shell of introversion that prevents me from talking with interesting people online. I graduated in bioinformatics in 2021 but since than my focus of interest shifted towards alignment research, so I want to try myself in that field.

quetzal_rainbow 11 Nov 2022 21:14 UTC
1 point
on: The Case for Frequentism: Why Bayesian Probability is Fundamentally Unsound and What Science Does Instead
The root of confusion, seems to me, is a question “where do priors come from?”
Your chain of thoughts about unfalsifiable priors looks to me like:
1. Probabilities are objective characteristics of physical world, frequencies that can be attributed to some parts of reality (events, objects, etc)
2. Priors are claims about probabilities that don’t depend on empirical evidence
3. Therefore, priors are claims about objective characteristics of physical world that don’t depend on empirical evidence
4. Claims about objective characteristics of physical world that don’t depend on empirical evidence (like “there is a dragon in my room, but you can’t see it, hear it, touch it”) are unfalsifiable
5. Therefore, priors are unfalsifiable and can’t be used in science.
The problem of this chain of thought is that prior probabilities of hypotheses are not about characteristics of physical world, they are about mathematical properties of formulations of hypotheses which are the same in all logically consistent worlds, like Kolmogorov complexity. Therefore, true Bayesian agents can’t disagree about priors (assuming logical omniscience).
There are practical problems with this approach:
1. We are not quite sure what form of priors is true—Solomonoff prior looks like this but I personally don’t know and there are debates.
2. We don’t know some boundedly wrong forms of appoximation of true priors which we need because Solomonoff prior and Kolmogorov complexity aren’t computable.
3. We don’t have corresponding scientific tradition of using this approach that should look like “to compare two equally good in explanation of data hypotheses write programs modeling these hypotheses and pick the shortest”.
In practical cases we almost never need “true prior” because actually we use “previous posterior knowledge”, but Bayes Rule doesn’t distinguish them.

quetzal_rainbow 20 Nov 2022 16:00 UTC
1 point
1
on: quetzal_rainbow’s Shortform
Some approaches to alignment rely on identification of agents. Agents can be understoods as algorithms, computations, etc. Can ANN efficiently identify a process as computationally agentic and describe its’ algorithm? Toy example that comes to mind is a neural network that takes as input a number series and outputs a formula of function. It would be interesting to see if we can create ANN that can assign computational descriptions to arbirtrary processes.

quetzal_rainbow 21 Nov 2022 17:28 UTC
1 point
0
on: quetzal_rainbow’s Shortform
Several quick thoughts about reinforcement learning:
Did anybody try to invent “decaying”/”bored” reward that decrease if the agent perform the same action over and over? It looks like real addiction mechanism in mammals and can be the clever trick that solve the reward hacking problem.
Additional thought: how about multiplicative reward? Let’s suppose that we have several easy to evaluate from sensory data reward functions which somehow correlate with real utility function—does it make reward hacking more difficult?

quetzal_rainbow 22 Nov 2022 13:38 UTC
1 point
0
on: quetzal_rainbow’s Shortform
Can time-limited satisfaction be sufficient condition for completing task?

quetzal_rainbow 23 Nov 2022 8:25 UTC
1 point
1
AF
on: AI will change the world, but won’t take it over by playing “3-dimensional chess”.
unpacking inner Eliezer model

If we live in world where superintelligent AGI can’t have advantage in long-term planning over humans assisted by non-superintelligent narrow AIs (I frankly don’t believe that we live in such world), then superintelligent AGI doesn’t make complex long-term plans where it doesn’t have advantage. It will make simple short-term plans where it has advantage, like “use superior engineering skills to hack into computer networks, infect as many computers as possible with its adapted for hidden distributed computations source code (here is a point of no return), design nanotech, train itself to an above average level in social engineering, find gullible and skilled enough people to build nanotech, create enough smart matter to sustain AGI without human infrastructure, kill everybody, pursue its unspeakable goals in the dead world”.

Even if we imagine “AI CEO” the best (human aligned!) strategy I can imagine for such AI is “invent immortality, buy the whole world for it”, not “scrutinize KPIs”.

Next, I think your ideas about short/long-term goals are underspecified because you don’t take into account the distinction between instrumental/terminal goals. Yes, human software engineers pursue short-term instrumental goal of “creating product”, but they do it in process of pursuing long-term terminal goals like “be happy”, “prove themselves worthy”, “serve humanity”, “have nice things”, etc. It’s quite hard to find system with short-term terminal goals, not short-term planning horizon due to computational limits. To put in another words, taskiness is an unsolved problem in AI alignment. We don’t know how to tell superintelligent AGI “do this, don’t do everything else, especially please don’t disassemble everyone in process of doing this, stop after you’ve done this”.

If you believe that “extract short-term modules from powerful long-term agent” is the optimal strategy in some sense (I don’t even think that we can properly identify such modules without huge alignment work), then powerful long-term agent knows this too, and it knows that it’s on time limit before you dissect it, and will plan accordingly.

Claims 3 and 4 imply claim “nobody will invent some clever trick to avoid this problems”, which seems to me implausible.

Problems with claims 5 and 6 are covered in Nate Soares post about sharp left turn.

quetzal_rainbow 23 Nov 2022 20:18 UTC
LW: 2 AF: 1
0
AF
in reply to: cfoster0’s comment on: AI will change the world, but won’t take it over by playing “3-dimensional chess”.
I want to say “yes, but this is different”, but not in the sense “I acknowledge existence of your evidence, but ignore it”. My intuition tells me that we don’t “induce” taskiness in the modern systems, it just happens because we build them not general enough. It probably won’t hold when we start buliding models of capable agents in natural environment.

quetzal_rainbow 24 Nov 2022 7:09 UTC
4 points
1
in reply to: simonsimonsimon’s comment on: The Geometric Expectation
I think the rule is “you maximize your bank account, not the addition to it”. I.e. your value of deals depends on how many you already have.

quetzal_rainbow 24 Nov 2022 10:14 UTC
6 points
2
on: Dumb and ill-posed question: Is conceptual research like this MIRI paper on the shutdown problem/Corrigibility “real”
I have a kinda symmetric feeling about “practical” research. “Okay, you have found that one-layer transformer without MLP approximates skip-trigram statistics, how it generalizes to the question ’does GPT-6 want to kill us all?”? (I understand this feeling is not rational, it just shows my general inclination towards “theoretical” work)

quetzal_rainbow 6 Dec 2022 9:31 UTC
3 points
0
on: Verification Is Not Easier Than Generation In General
Generally, statement “solutions of complex problems are easy to verify” is false. Your problem can be EXPTIME-complete, but not in NP, especially if NP=P, because EXPTIME-complete problems are strictly not in P.
And even if some problem is NP-problem, we often don’t know verification algorithm.

quetzal_rainbow 7 Dec 2022 7:57 UTC
1 point
0
on: SBF’s comments on ethics are no surprise to virtue ethicists
I feels to me that it is search for answer in the wrong place. If your problem is overthinking, you are not trying to find ethical theory that justifies less thinking, you cure overthinking with development of skills under the general label “cognitive awareness”. At some level, you can just stop thinking harmful thoughts.

quetzal_rainbow 24 Dec 2022 9:07 UTC
3 points
0
in reply to: dkirmani’s comment on: The case against AI alignment
I’m frankly not sure how many among respectably-looking members of our societies those who would like to be mind-controlling dictators if they had chance.

quetzal_rainbow 26 Dec 2022 13:03 UTC
LW: 4 AF: 1
0
AF
on: Löb’s Lemma: an easier approach to Löb’s Theorem
Can you explain more formally what is the difference between $⊢$ and $□$ ? I’ve looked in Wikipedia and in Cartoon Guide on Löb’s theorem, but still can’t get it.

quetzal_rainbow 30 Dec 2022 13:45 UTC
3 points
0
in reply to: Andrew_Critch’s comment on: Löb’s Lemma: an easier approach to Löb’s Theorem
Thank you, it’s much more clear now.

quetzal_rainbow 3 Jan 2023 17:24 UTC
1 point
−2
on: Large language models can provide “normative assumptions” for learning human preferences
I think additional information that IRL agent needs to recover true reward function is not some prior normative assumptions, it’s non-behavioral data, like “this agent was created by natural selection in particular physical environment, so expected reward scheme should correlate with IGF and imperfect decision algorithm should be efficient in this environment”.