TsviBT

Karma: 7,155

TsviBT May 29, 2025, 5:55 PM
4 points
0
in reply to: MalcolmMcLeod’s comment on: Eliezer and I wrote a book: If Anyone Builds It, Everyone Dies
It was briefly in the 300s overall, and 1 or 2 in a few subcategory thingies.

TsviBT May 28, 2025, 4:39 PM
3 points
0
in reply to: Q Home’s comment on: A single principle related to many Alignment subproblems?
[call turns out to be maybe logistically inconvenient]

It’s OK if a person’s mental state changes because they notice a pink car (“human object recognition” is an easier to optimize/comprehend process). It’s not OK if a person’s mental state changes because the pink car has weird subliminal effects on the human psyche (“weird subliminal effects on the human psyche” is a harder to optimize/comprehend process).

So, somehow you’re able to know when an AI is exerting optimization power in “a way that flows through” some specific concepts? I think this is pretty difficult; see the fraughtness of inexplicitness or more narrowly the conceptual Doppelgänger problem.

It’s extra difficult if you’re not able to use the concepts you’re trying to disallow, in order to disallow them—and it sounds like that’s what you’re trying to do (you’re trying to “automatically” disallow them, presumably without the use of an AI that does understand them).

You say this:

But I don’t get if, or why, you think that adds up to anything like the above.

Anyway, is the following basically what you’re proposing?

Humans can check goodness of $A_{0}$ because $A_{0}$ is only able to think using stuff that humans are quite familiar with. Then $A_{k}$ is able to oversee $A_{k + 1}$ because… (I don’t get why; something about mapping primitives, and deception not being possible for some reason?) Then $A_{N}$ is really smart and understands stuff that humans don’t understand, but is overseen by a chain that ends in a good AI, $A_{0}$ .

TsviBT May 27, 2025, 9:42 AM
7 points
3
in reply to: cubefox’s comment on: Wei Dai’s Shortform
I don’t know whether this applies to you

I’m not sure. I did put in some effort to survey various strands of philosophy related to axiology, but not much effort. E.g. looked at some writings in the vein of Anscombe’s study of intention; tried to read D+G because maybe “machines” is the sort of thing I’m asking about (was not useful to me lol); have read some Heidegger; some Nietzsche; some more obscure things like “Care Crosses the River” by Blumenberg; the basics of the “analytical” stuff LWers know (including doing some of my own research on decision theory); etc etc. But in short, no, none of it even addresses the question—and the failure is the sort of failure that was supposed to have its coarsest outlines brought to light by genuinely Socratic questioning, which is why I call it “pre-Socratic”, not to say that “no one since Socrates has billed themselves as talking about something related to values or something”.

I think even communicating the question would take a lot of work, which as I said is part of the problem. A couple hints:
- https://www.lesswrong.com/posts/NqsNYsyoA2YSbb3py/fundamental-question-what-determines-a-mind-s-effects (I think if you read this it will seem incredibly boringly obvious and trivial, and yet, literally no one addresses it! Some people sort of try, but fail so badly that it can’t count as progress. Closest would be some bits of theology, maybe? Not sure.)
- https://www.lesswrong.com/posts/p7mMJvwDbuvo4K7NE/telopheme-telophore-and-telotect (I think this distinction is mostly a failed attempt to carve things, but the question that it fails to answer is related to the important question of values.)
- You should think of the question of values as being more like “what is the driving engine” rather than “what are the rules” or “what are the outcomes” or “how to make decisions” etc.

TsviBT May 27, 2025, 1:00 AM
7 points
3
in reply to: cubefox’s comment on: Wei Dai’s Shortform
“A lot of progress”.… well, reality doesn’t grade on a curve. Surely someone has said something about something, yes, but have we said enough about what matters? Not even close. If you don’t know how inadequate our understanding of values is I can’t convince you in a comment, but one way to find out would be to try to solve alignment. E.g. see https://tsvibt.blogspot.com/2023/03/the-fraught-voyage-of-aligned-novelty.html

TsviBT May 26, 2025, 5:40 PM
7 points
3
in reply to: Wei Dai’s comment on: Wei Dai’s Shortform
It seems to me that values have been a main focus of philosophy for a long time I’m curious about the rationale behind your suggestion.

Specifically the question of “what values are” I don’t think has been addressed (I’ve looked around some, but certainly not thoroughly). A key problem with previous philosophy is that values are extreme in how much they require some sort of mental context (https://www.lesswrong.com/posts/HJ4EHPG5qPbbbk5nK/gemini-modeling). Previous philosophy (that I’m aware of) largely takes the mental context for granted, or only highlights the parts of it that are called into question, or briefly touches on it. This is pretty reasonable if you’re a human talking to humans, because you do probably share most of that mental context. But it fails on two sorts of cases:
1. trying to think about or grow/construct/shape really alien minds, like AIs;
2. trying to exert human values in a way that is good but unnatural (think for example of governments, teams, “superhuman devotion to a personal mission”, etc.).
The latter, 2., might have, given more progress, helped us be wiser.

My comment was responding to

it was a bad idea to invent things like logic, mathematical proofs, and scientific methodologies, because it permanently accelerated the wrong things (scientific and technological progress) while giving philosophy only a temporary boost (by empowering the groups that invented those things, which had better than average philosophical competence, to spread their culture/influence).

So I’m saying, in retrospect on the 2.5 millennia of philosophy, it plausibly would have been better to have an “organology, physiology, medicine, and medical enhancement” of values. To say it a different way, we should have been building the conceptual and introspective foundations that would have provided the tools with which we might have been able to become much wiser than is accessible to the lone investigators who periodically arise, try to hack their way a small ways up the mountain, and then die, leaving mostly only superficial transmissions.

whereas metaphilosophy has received much less attention.

I would agree pretty strongly with some version of “metaphilosophy is potentially a very underserved investment opportunity”, though we don’t necessarily agree (because of having “very different tastes” about what metaphilosophy should be, amounting to not even talking about the same thing). I have ranted several times to friends about how philosophy (by which I mean metaphilosophy—under one description, something like “recursive communal yak-shaving aimed at the (human-)canonical”) has barely ever been tried, etc.

TsviBT May 26, 2025, 3:33 PM
9 points
12
in reply to: Noosphere89’s comment on: Wei Dai’s Shortform

I want to flag that the position that we could have understood values/philosophy without knowing about math/logic is a fictional world/fabricated option.

Maybe but I don’t believe that you know this. Lots of important concepts want to be gotten at by routes that don’t use much math or use quite different math from “math to understand computers” or “math to formalize epistemology”. Darwin didn’t need much math to get lots of the core structure of evolution by natural selection on random mutation.

TsviBT May 26, 2025, 3:31 PM
2 points
0
in reply to: Q Home’s comment on: A single principle related to many Alignment subproblems?

the current human values are enough to express corrigibility

Huh? Not sure I understand this. How is this the case?

(I may have to tap out, because busy. At some point we could have a call to chat—might be much easier to communicate in that context. I think we have several background disagreements, so that I don’t find it easy to interpret your statements.)

TsviBT May 26, 2025, 3:09 PM
5 points
0
in reply to: cubefox’s comment on: Wei Dai’s Shortform

Can you make the problem statement more precise

No, that’s part of the problem. There’s pretheoretic material as some of a starting point here:

https://www.lesswrong.com/posts/YLRPhvgN4uZ6LCLxw/human-wanting

Whatever those things are, you’d want to understand the context that makes them what they are:

https://www.lesswrong.com/posts/HJ4EHPG5qPbbbk5nK/gemini-modeling

And refactor the big blob into lots of better concepts, which would probably require a larger investigation and conceptual refactoring:

https://www.lesswrong.com/posts/TNQKFoWhAkLCB4Kt7/a-hermeneutic-net-for-agency

In particular so that we understand how “values” can be stable (https://www.lesswrong.com/posts/Ht4JZtxngKwuQ7cDC/tsvibt-s-shortform?commentId=koeti9ygXB9wPLnnF) and can incorporate novel concepts / deal with novel domains (https://www.lesswrong.com/posts/CBHpzpzJy98idiSGs/do-humans-derive-values-from-fictitious-imputed-coherence) and eventually address the stuff here: https://www.lesswrong.com/posts/ASZco85chGouu2LKk/the-fraught-voyage-of-aligned-novelty

TsviBT May 25, 2025, 11:21 AM
9 points
2
in reply to: Q Home’s comment on: A single principle related to many Alignment subproblems?
The issue is that the following is likely true according to me, though controversial:

The type of mind that might kill all humans has to do a bunch of truly novel thinking.

To have our values interface appropriately with with this novel thinking patterns in the AI, including through corrigibility, I think we have to work with “values” that are the sort of thing that can refer / be preserved / be transferred across “ontological” changes.

Quoting from https://tsvibt.blogspot.com/2023/09/a-hermeneutic-net-for-agency.html:

Rasha: “This will discover variables that you know how to evaluate, like where the cheese is in the maze—you have access to the ground truth against which you can compare a reporter-system’s attempt to read off the position of the cheese from the AI’s internals. But this won’t extend to variables that you don’t know how to evaluate. So this approach to honesty won’t solve the part of alignment where, at some point, some mind has to interface with ideas that are novel and alien to humanity and direct the power of those ideas toward ends that humans like.”

TsviBT May 25, 2025, 10:12 AM
5 points
3
in reply to: Q Home’s comment on: A single principle related to many Alignment subproblems?

When it looks like humans form preferences about incomprehensible things, they really form preferences only about comprehensible properties of those incomprehensible things

Then you’re not talking about human values, you’re talking about [short timescale implementations of values] or something.

TsviBT May 25, 2025, 12:39 AM
3 points
4
in reply to: Wei Dai’s comment on: Wei Dai’s Shortform
I’d suggest that trying to understand what values are would potentially have been a better direction to emphasize. Our understanding here is still pre-Socratic, basically pre-cultural.

TsviBT May 24, 2025, 6:31 PM
8 points
2
in reply to: Jono’s comment on: Jono’s Shortform

Is everyone dropping the ball on cryonics

More or less AFAIK. (See though https://www.amazon.com/Future-Loves-You-Should-Abolish-ebook/dp/B0CW9KTX76 )

TsviBT May 24, 2025, 1:43 AM
9 points
0
on: That’s Not How Epigenetic Modifications Work
I have a pet lay-speculation that there’s a pretty mathematically interesting question here, which hasn’t been understood yet. I can’t formulate the question clearly, but it’s something like: “What sort of thing are these states?” We can abstractly talk about stable states of high-dimensional dynamical systems, but this probably isn’t very satisfying or helpful in this context. There’s some more practical or concrete or specific things we might want to know about the landscape of possible stable or quasi-stable states for gene regulatory networks, and how they transition, and how one could perturb them.

TsviBT May 22, 2025, 3:45 PM
LW: 10 AF: 7
4
AF
on: A single principle related to many Alignment subproblems?
By default, humans only care about variables they could (in principle) easily optimize or comprehend.

I think this is incorrect. I think humans have values which are essentially provisional. In other words, they’re based on pointers which are supposed to be impossible to fully dereference. Examples:
1. Friendship—pointing at another mind, who you never fully comprehend, who can always surprise you—which is part of the point
2. Boredom / fun—pointing at surprise, novelty, diagonalizing against what you already understand

TsviBT May 22, 2025, 3:11 PM
14 points
0
on: Policy recommendations regarding reproductive technology
At the moment, the US government is calling for deregulation suggestions: https://www.regulations.gov/deregulation. If there’s someone who understands how the US Code of Federal Regulations works, and would be up for making a couple submissions, one or two of the policy recommendations here might be doable. E.g. maybe we can delete some small subset of the CITES treaty so that cell lines can be imported.

(IIUC, this call for suggestions is only for stuff in the US CFR, which is regulations implemented by executive branch departments to fulfill Congressional laws? So we can’t directly address federal or state laws this way, though maybe the implementations of these laws can be modified.)

Policy recommendations regarding reproductive technology

TsviBTMay 22, 2025, 2:49 PM

75 points

2 comments3 min readLW link

TsviBT May 21, 2025, 11:37 PM
24 points
−12
in reply to: Matthew Khoriaty’s comment on: Matthew Khoriaty’s Shortform
My version:

Probably too understated, but it’s the sort of thing I like.

GoogleDraw link if anyone wants to copy and modify: https://docs.google.com/drawings/d/10nB-1GC_LWAZRhvFBJnAAzhTNJueDCtwHXprVUZChB0/edit

TsviBT May 21, 2025, 4:45 PM
3 points
0
in reply to: Dw629’s comment on: TsviBT’s Shortform
Yeah unfortunately it seems to be the case that no one has really seriously tried (ie invested a lot of resources, on the scale of a large company or a government) to do R&D on significantly increasing IQ in healthy people through drugs...

Ok.
1. Cringe. But,
2. If anyone is reading this, if Dw629′s claims are true, this is a place where everyone’s dropping the ball for no good reason, so you could have the ball!
I really do recommend to talk with the people at Nootopics.

Yep… If I find the time/energy I’ll do so.

Thanks for your help!

TsviBT May 20, 2025, 9:28 PM
9 points
0
in reply to: Dw629’s comment on: TsviBT’s Shortform
Very interesting, thanks. I’ve now read most of your links. Obviously I can’t actually evaluate them but they seem intriguing… Especially because IIUC they at least allege positive effects working on different regions of the brain (and contributing to improvements on different sorts of tests), which suggests maybe they can stack.

I take your point that no one’s really trying. Has anyone really tried to really try? For example, has someone who actually knows their stuff tried working out a plausible market plan (e.g. how to deal with regulation), and then tried to get venture capital, for intelligence enhancement? I guess there’s tons of stuff sold as mind enhancing, though presumably it’s mostly useless; and if these are all research chemicals from pharma companies then they’d be hard to sell… Or, has anyone tried a noncommercial (philanthropic, say) angle? Maybe I should talk to the Noo people.

TsviBT May 20, 2025, 9:07 PM
2 points
0
in reply to: dr_s’s comment on: TsviBT’s Shortform
Hm. TBC, the broader category would be “molecule that would activate master regulation of one or more gene regulatory networks related to brain function”, e.g. a hormone but maybe also some other things.

TsviBT

Policy recom­men­da­tions re­gard­ing re­pro­duc­tive technology

Policy recommendations regarding reproductive technology