Signer

Karma: 428

Signer 12 Jun 2022 14:14 UTC
28 points
on: A claim that Google’s LaMDA is sentient
I mean, it doesn’t matter that it’s not an evidence of sentience because trying to scale without reliable detectors (and architecture that allows for them) of ethically-significant properties was irresponsible from the start. And the correct response is shutting down of research, not “the only person in our system of checks who says we are wrong is the one we fired, so we are going to ignore them”.

Signer 5 Mar 2024 18:42 UTC
24 points
4
on: Many arguments for AI x-risk are wrong

Undo the update from the “counting argument”, however, and the probability of scheming plummets substantially.

Wait, why? Like, where is low probability is actually coming from? I guess from some informal model of inductive biases, but then why “formally I only have Solomonoff prior, but I expect other biases to not help” is not an argument?

Signer 10 Feb 2024 16:33 UTC
17 points
9
on: Dreams of AI alignment: The danger of suggestive names

Maybe it’s because people perceive me as an Optimist and therefore my points must be combated at any cost.

Maybe people really just naturally and unbiasedly disagree this much, though I doubt it.

But the end result is that I have given up on communicating with most folk who have been in the community longer than, say, 3 years.

Not saying that it’s fun or even obviously net-positive for all participants, but I think combative communication is better than no communication, as far as truth-seeking goes.

To be frank, I think a lot of the case for AI accident risk comes down to a set of subtle word games.

Sure, but what if what’s left is risky enough? Maybe utility maximization is a bad model of future AI (maybe because it’s hard to predict the technology that doesn’t exist yet) - but what’s the alternative? Isn’t labelling some empirical graph that ignores warning signs “awesomeness” and extrapolating is more of a word game?

Signer 2 Mar 2022 22:00 UTC
15 points
on: Late 2021 MIRI Conversations: AMA / Discussion
Not sure if it’s a right place to ask, instead of just googling it, but anyway: does anyone know what’s the current state of AI security practices at DeepMind, OpenAI and other such places? Like, did they estimate probability of GPT-3 killing everyone before turning it on, do they have procedures for not turning something on, did they test these procedures by someone impersonating unaligned GPT and trying to manipulate researchers, things like that?

Signer 13 Oct 2023 15:36 UTC
14 points
11
on: AI #33: Cool New Interpretability Paper

This seems like the most important crux. Why should we not expect the maximizer we trained to X-maximize to use its affordances to maximize X’, where X’ is the exact actual thing the training feedback represents as a target, and that differs at least somewhat from X? Why should we expect to like the way it does that, even if X’ did equal X? I do not understand the other perspective.

Because that’s what happens—humans don’t always wirehead and neural networks don’t always overfit. Because training feedback is not utility, there are also local effects of training process and space of training data.

Signer 3 Dec 2023 13:48 UTC
13 points
8
on: Quick takes on “AI is easy to control”

“AI will be able to figure out what humans want” (yes; obviously; this was never under dispute)

I think the problem is that how existing systems figure out what humans want doesn’t seem to do anything with your theory of why it supposed to be relatively easy. Therefore the theory’s prediction of alignment being relatively hard is also doesn’t have anything to do with reality.

Signer 8 Dec 2021 19:46 UTC
13 points
in reply to: Quintin Pope’s comment on: Deepmind’s Gopher—more powerful than GPT-3
Sure, but people do worry about harming rats too much, and, more importantly, by the time we get to actual human level it may be already late. Like, there is no prepared procedure for stopping that whole process of scaling, no robust humanity-meters to know when you can safely proceed, and even no consensus on relevant abstract ethics.

Signer 29 Feb 2024 17:43 UTC
12 points
8
on: Counting arguments provide no evidence for AI doom
I don’t get how you can arrive at 0.1% for future AI systems even if NNs are biased against scheming. Humans scheme, the future AI systems trained to be capable of long if-then chains may also learn to scheme, maybe because explicitly changing biases is good for performance. Or even, what, you have <0.1% on future AI systems not using NNs?

Also, not saying “but it doesn’t matter”, but assuming everyone agrees that spectrally biased NN with classifier or whatever is a promising model of a safe system. Do you then propose we should not worry and just make the most advanced AI we can as fast as possible. Or it would be better to first reduce remaining uncertainty about behavior of future systems?

Signer 26 Mar 2022 17:07 UTC
12 points
on: Ukraine Post #7: More Data and Peace Terms

Technically Russians have the right to peacefully protest. In practice of course they do not. That leads to scenes like this.

For people like me, who don’t click on links and then feel misled—this photo is from around 2019.

Signer 20 Dec 2023 8:39 UTC
11 points
0
on: Ronny and Nate discuss what sorts of minds humanity is likely to find by Machine Learning

you can argue all you want that any flying device will have to flap its wings, and that won’t constrain airplane designs

You can argue all you want that any thinking device will have to reflect on its thoughts, and that won’t constrain mind designs.

the prior is still really wide, so wide that a counting argument still more-or-less works

And it also works for arguing that GPT3 won’t happen—there are more hacks that give you low loss than there useful to humans hacks that give you low loss.

so does your whole sense of difference go out the window if we do something autogpt-ish?

I think it should be analyzed separately, but intuitively if your gpt never thinks of killing humans, it should be less likely that the plans with these thoughts would result in killing humans.

Signer 13 Jul 2023 6:17 UTC
10 points
6
in reply to: Andrew_Critch’s comment on: Consciousness as a conflationary alliance term

The “hard problem of consciousness” is the problem of resolving a linguistic dispute disguised as an ontological one, where people agree on the normative properties of consciousness (it’s valuable) but on its descriptive properties (its nature as a process/pattern.)

That’s just another conflation—of an easy and the hard problem—yes, there is disagreement about what mental processes are valuable, but there is also ontological problem and not everyone agree that ontological consciousness is intrinsically valuable.

Signer 14 Jun 2022 15:13 UTC
10 points
in reply to: Vanessa Kosoy’s comment on: A claim that Google’s LaMDA is sentient

What information about cat brains can I possibly learn to make me classify them as “non-persons”?

Do you value conscious experience in yourself more than unconscious perception with roughly the same resulting external behavior? Then it is conceivable that empathy is mistaken about what kind of system is receiving inputs in cat’s case and there is at least difference in value depending on internal organization of cat’s brain.

Signer 19 Feb 2023 1:53 UTC
9 points
7
on: Should we cry “wolf”?
I think just truthfully crying “let’s prepare for real wolf”, because they clearly didn’t train the miniature one well enough, has fewer downsides.

Signer 17 Apr 2024 18:36 UTC
8 points
0
in reply to: Rob Bensinger’s comment on: When is a mind me?

Analogy: When you’re writing in your personal diary, you’re free to define “table” however you want. But in ordinary English-language discourse, if you call all penguins “tables” you’ll just be wrong. And this fact isn’t changed at all by the fact that “table” lacks a perfectly formal physics-level definition.

You’re also free to define “I” however you want in your values. You’re only wrong if your definitions imply wrong physical reality. But defining “I” and “experiences” in such a way that you will not experience anything after teleportation is possible without implying anything physically wrong.

You can be wrong about physical reality of teleportation. But even after you figured out that there is no additional physical process going on that kills your soul, except for the change of location, you still can move from “my soul crashes against an asteroid” to “soul-death in my values means sudden change in location” instead of to “my soul remains alive”.

It’s not like I even expect you specifically to mean “don’t liking teleportation is necessary irrational” much. It’s just that saying that there should be an actual answer to questions about “I” and “experiences” makes people moral-realist.

Signer 7 Apr 2024 2:50 UTC
8 points
0
on: My intellectual journey to (dis)solve the hard problem of consciousness
Sure, “everything is a cluster” or “everything is a list” is as right as “everything is emergent”. But what’s the actual justification for pruning that neuron? You can prune everything like that.

Great! This text by Yudkowsky has convinced me that the Philosophical Zombie thought experiment leads only to epiphenomenalism and must be avoided at all costs.

Do you mean that the original argument that uses zombies leads only to epiphenomenalism, or that if zombies were real that would mean consciousness is epiphenomenal, or what?

Signer 8 Dec 2021 19:04 UTC
7 points
on: Deepmind’s Gopher—more powerful than GPT-3

a study of ethical and social risks associated with large language models

And somehow nobody cares about potential ethical implications of simulating near-human quantities of neurons.

Signer 15 Jun 2021 18:10 UTC
7 points
on: Can someone help me understand the arrow of time?
there’d be no point doing anything

That’s not even in the top five reasons to not do anything:
- There is no ultimate reason to do things you have reasons to do.
- Everything you not do still happens in other parts of multiverse.
- You doing things have a chance to create infinite amount of hells.
- Valence is arbitrary.
- The most precise model of you is action minimizer, so you should notdo things.
And I don’t think it’s totally solved, but you can interpret “we exist at a single point in the timeline” as something like “you can describe yourself differentially” i.e. what really exists is the timeline. Then the point of the theory is that if timeline contains memories then it contains all your expreiances.

Signer 2 Mar 2022 9:37 UTC
LW: 6 AF: 3
AF
on: Late 2021 MIRI Conversations: AMA / Discussion
It was all very interesting, but what was the goal of these discussions? I mean I had an impression that pretty much everyone assigned >5% probability to “if we scale we all die” so it’s already enough reason to work on global coordination on safety. Is the reasoning that the same mental process that assigned too low probability would not be able to come up with actual solution? Or something like “at the time they think their solution reduced probability of failure from 5% to 0.1% it would still be much higher”? Seems to be only possible if people don’t understand arguments about inner optimisators or what not, as opposed to disagreeing with them.

Signer 26 Mar 2021 18:49 UTC
6 points
on: Toward A Bayesian Theory Of Willpower
I don’t understand what’s the point of calling it “evidence” instead of “updating weights” unless brain literally implements P(A|B) = [P(A)*P(B|A)]/P(B) for high level concepts like “it’s important to do homework”. And even then this story about evidence and beliefs doesn’t bring anything additional to the explanation with specific weight aggregation algorithm.

Signer 4 Mar 2024 18:44 UTC
5 points
2
on: The Solution to Sleeping Beauty

Which is equal to the sample space of the coin toss, just with different names for the outcomes.

Well then by your arguments it can’t be describing the Sleeping Beauty problem, when it is a much better match for the Just a Coin Toss problem.

Whether Monday is today or was yesterday is irrelevant—it is the same outcome, anyway.

But what if you actually want to know?

Well, here is why not. Because in the latter case you are making baseless assumptions and contradicting the axioms of probability theory.

Again, Elga’s model doesn’t contradict the axioms of probability theory. Are we supposed to just ignore that mathematical fact, that you can use probability theory with subjective decomposition of outcomes?