A while back I was thinking about a kind of opposite approach. If we train many agents and delete most of them immediately, they may be looking to get as much reward as possible before being deleted. Potentially deceptive agents may prefer to show their preferences. There are many IFs to this idea but I’m wondering whether it makes any sense.
martinkunev
Both gravity and inertia are determined by mass. Both are explained by spacetime curvature in general relativity. Was this an intentional part of the metaphor?
I find the ideas you discuss interesting, but they leave me with more questions. I agree that we are moving toward a more generic AI that we can use for all kinds of tasks.
I have trouble understanding the goal-completeness concept. I’d reiterate @Razied ’s point. You mention “steers the future very slowly”, so there is an implicit concept of “speed of steering”. I don’t find the turing machine analogy helpful in infering an analogous conclusion because I don’t know what that conclusion is.
You’re making a qualitative distinction between humans (goal-complete) and other animals (non-goal complete) agents. I don’t understand what you mean by that distinction. I find the idea of goal completeness interesting to explore but quite fuzzy at this point.
The turing machine enumeration analogy doesn’t work because the machine needs to halt.
Optimization is conceptually different than computation in that there is no single correct output.
What would humans not being goal-complete look like? What arguments are there for humans being goal-complete?
I’m wondering whether useful insights can come from studying animals (or even humans from different cultures) - e.g. do fish and dolphins form the same abstractions; do bats “see” using ecolocation?
What is Ontology?
I hope the next parts don’t get delayed due to akrasia :)
my guess was 0.8 cheat, 0.2 steal (they just happen to add up to 1 by accident)
[Question] Choosing a book on causality
Max Tegmark presented similar ideas in a TED talk (without much details). I’m wondering if he and Davidad are in touch.
The ban on holocaust denial undermines the concept of free speech—there is no agreed upon schelling point and arguments start. Many people don’t really understand the concept of free speech because the example they see is actually a counterexample.
Not everyone is totally okay with it, I certainly am not.
Maybe “irrational” is not the right word here. The point I’m trying to make is that human preferences are not over world states.
When discussing preference rationality, arguments often consider only preferences over states of the world, while ignoring transitions between those states. For example, a person in San Francisco may drive to San Jose, then to Oakland and then back to San Francisco simply because they enjoy moving around. Cyclic transition between states is not necessarily something that needs fixing.
When a teacher is wondering whether to skip explaining concept-X, they should ask “who is familiar with concept-X” and not “who is not familiar with concept-X”.
For question 2
you haven’t proven f is continuous
For question 3 you say
is a contraction map because is differentiable and …
I would think proving this is part of what is asked for.
@TurnTrout You wrote “if I don’t, within about a year’s time, have empirically verified loss-as-chisel insights which wouldn’t have happened without that frame...”
More than a year later, what do you think?
I tend to agree but I believe most non-aligned behavior is due to scarcity. It’s hard to get into the heads of people like Stalin but I believe if everybody has a very realistic virtual reality where they could do all the things they’d do in real life, they may be much less motivated to enter into conflict with other humans.
should have some sort of representation which allows us to feed it into a Turing machine—let’s say it’s an infinite bit-string which...
Why do we assume the representation is infinite? Do we assume the environment in which the agent operates is infinite?
For example, US-China conflict is fueled in part by the AI race dynamics.
I didn’t provide any evidence because I didn’t make any claim (about timelines or otherwise). I’m trying to get my views by asking on lesswrong and I get something like “You have no right to ask this”.
I quoted Yudkowski because he asks a related question (whether you agree with his assessment or not).
“I’m not convinced timelines should be relevant to having kids”
Thanks, this looks more like an answer.
It seems to me that objective impact stems from convergent instrumental goals—self-preservation, resource acquisition, etc.