Davide_Zagami

Karma: 43

Vulnerabilities in CDT and TI-unaware agents

PabloAMC, Davide_Zagami and Chris_Leong

10 Mar 2020 14:14 UTC

5 points

1 comment4 min readLW link

Davide_Zagami 16 Mar 2019 13:28 UTC
4 points
0
in reply to: Maybe_a’s comment on: AI Safety Prerequisites Course: Revamp and New Lessons
Registration and access to the lessons is completely free. Where do you see a paywall?

Davide_Zagami 14 Mar 2019 22:19 UTC
10 points
0
in reply to: lifelonglearner’s comment on: AI Safety Prerequisites Course: Basic abstract representations of computation
Hi, full time content developer at RAISE here.
The overview page you are referring to (is it this one?) contains just some examples of subjects that we are working on.
1. One of the main goals is making a complete map of what is out there regarding AI Safety, and then recursively create explanations for the concepts it contains. That could fit multiple audiences depending on how deep we are able to go. We have started doing that with IRL and IDA. We are also trying a bottom-up approach with the prerequisite course because why not.
2. Almost the same as reading papers, with clear pointers to references to quickly integrate any missing knowledge. Whether this will be achieved in the best case or in the average case is currently under testing.
3. I don’t know about the absolute amount of time required for that. Keep in mind that this remains to be confirmed, but we have recently started collecting some statistics that suggest it’s going to be at least comparatively quicker to read RAISE material, compared to having to search for the right papers plus reading and understanding them. This would be the second main goal.
(Of course, it’s also much easier from my position to be engaging/critiquing existing works, than to actually put in the effort to make all of this happen. I don’t mean any of the above as an indictment. It’s admirable and impressive that y’all have coordinated to make this happen at all!)
Thanks :)

Am I understanding the problem of fully updated deference correctly?

Davide_Zagami30 Sep 2018 10:55 UTC

3 points

1 comment1 min readLW link

Davide_Zagami 24 Jun 2018 20:30 UTC
6 points
4
on: Duplication versus probability
After reading this I feel that how one should deal with anthropics strictly depends on goals. I’m not sure exactly which cognitive algorithm does the correct thing in general, but it seems that sometimes it reduces to “standard” probabilities and sometimes not. May I ask what does UDT say about all of this exactly?
Suppose you’re rushing an urgent message back to the general of your army, and you fall into a deep hole. Down here, conveniently, there’s a lever that can create a duplicate of you outside the hole. You can also break open the lever and use the wiring as ropes to climb to the top. You estimate that the second course of action has a 50% chance of success. What do you do?
Obviously, if the message is your top priority, you pull the lever, and your duplicate will deliver that message. This succeeds all the time, while the wire-rope only has 50% chance of working.
Agree.
after pulling the lever, do you expect to be the copy on the top or the copy on the bottom?
Question without meaning per se, agree.
what if the lever initially creates one copy—and then, five seconds later, creates a million? How do you update your probabilities during these ten seconds?
Before pulling the lever, I commit to do the following.
For the first five seconds, I will think (all copies of me will think) “I am above”. This way, 50% of all my copies will be wrong.
For the remaining five seconds, I will think (all copies of me will think) “I am above”. This way, one millionth of all my copies will be wrong.
If each of my copies was receiving money for distinguishing which copy he is, then only one millionth of all my copies would be poor.
This sounds suspiciously like updating probabilities the “standard” way, especially if you substitute “copies” with “measure”.

Davide_Zagami 4 Jun 2018 10:43 UTC
1 point
0
on: Against accusing people of motte and bailey
But suppose that we were discussing something of which there were both sensible and crazy interpretations—held by different people. So:
group A consistently makes and defends sensible claim A1
group B consistently makes and defends crazy claim B1
and maybe even:
group C consistently makes crazy claim B1, but when challenged on it, consistently retreats to defending A1
I may be missing something but it seems to me that:
- if C is accused of motte-and-bailey fallacy there is no problem;
- if B is accused of motte-and-bailey fallacy there is a problem because they never defended claim A1;
- if A is accused of motte-and-bailey fallacy there is a problem because they never defended claim B1.
I hope I’m not being silly: would it be fair to say that you are pointing to the existence of the “accuse people who are not making a motte-and-bailey fallacy of making a motte-and-bailey fallacy” fallacy? Could we call it “straw-motte-and-bailey fallacy” or something?

Davide_Zagami 5 May 2018 20:11 UTC
9 points
0
on: Negative Status: Should I Stop Posting Here?
I have only read a small fraction of Yudkowsky’s sequences (I printed the 1800 pages two days ago and have only read about 50), so maybe I think I am discussing interesting stuff where in reality EY has already discussed it in length.
Mostly this. Other things too, but all mostly are caused by this one. I am one of the few who commented in one of your posts with links to some of his writings exactly for this reason. While I’m guilty of not having given you any elaborate feedback and of downvoting that post, I still think you need to catch up with the basics. It’s praiseworthy that you want to engage in rationality and in new ideas, but by doing it without becoming familiar with the canon first, you are not just (1) probably going to say something silly (because rationality is harder than you think), (2) probably going to say something old (because a lot has been written), but also (3) wasting your own time.

Davide_Zagami 2 May 2018 21:02 UTC
2 points
0
on: Effective Egoism, or My Life in a Video Game
Fake Selfishness and Fake Morality

Davide_Zagami 12 Feb 2018 15:00 UTC
4 points
0
in reply to: Robert Miles’s comment on: “Just Suffer Until It Passes”
Ah! I independently invented this strategy some months ago and amazingly it doesn’t work for me simply because I’m somehow capable of remaining in the “do nothing” state for literally days. However I thought it was a brilliant idea when I came up with it and I still think it is, I would be surprised if it doesn’t work for a lot of people.

Davide_Zagami 26 Jan 2018 22:50 UTC
7 points
0
on: Babble
This post made a lot of things click for me. Also it made me realize I am one of those with an “overdeveloped” Prune filter compared to the Babble filter. How could I not notice this? I knew something was wrong all along, but I couldn’t pin down what, because I wasn’t Babbling enough. I’ve gotta Babble more. Noted.

Davide_Zagami 24 Jan 2018 21:13 UTC
4 points
0
on: “Slow is smooth, and smooth is fast”
Extremely important post in my opinion. The central idea seems true to me. I would like to see if someone has (even anecdotal) evidence for the opposite.

Davide_Zagami 29 Nov 2017 14:40 UTC
1 point
0
in reply to: cousin_it’s comment on: The Mad Scientist Decision Problem
Probably you should have simply said something similar to “increasing portions of physical space have diminishing marginal returns to humans”.

Davide_Zagami 29 Nov 2017 14:21 UTC
1 point
0
in reply to: cousin_it’s comment on: The Mad Scientist Decision Problem
Uhm. That makes sense. I guess I was operating under the definition of risk aversion that makes people give up risky bets just because the alternative is a less risky bet, even if it actually translates in less of absolute expected utility compared to the risky one. As far as I know, that’s the most used meaning of risk aversion. Isn’t there another term to disambiguate between concave utility functions and straightforward irrationality?

Davide_Zagami 29 Nov 2017 13:39 UTC
2 points
0
in reply to: cousin_it’s comment on: The Mad Scientist Decision Problem
I’m not sure it can be assumed that the deal is profitable for both parties. The way I understand risk aversion is that it’s a bug, not a feature; humans would be better off if they weren’t risk averse (they should self-modify to be risk neutral if and when possible, in order to be better at fulfilling their own values).

Davide_Zagami 24 Nov 2017 21:50 UTC
4 points
0
in reply to: habryka’s comment on: Unjustified ideas comment thread
I’m not sure how to put this. One reason that comes to mind for having it weekly is that it seems to me that threads get “old” very quickly now. For example it seems to me that out of all questions asked in the Stupid Questions thread that are unanswered, a good percentage of those are unanswered because people don’t see them, not because people don’t know the answers to them. (speaking of which, I haven’t seen that thread get reposted in some months, or am I missing something?)
May I suggest a period of 15 days?

Davide_Zagami 24 Nov 2017 21:34 UTC
3 points
0
in reply to: MrRobot’s comment on: Unjustified ideas comment thread
Something about getting social feedback feels a lot more powerful (to me) and helps me move past it quicker than just writing it down.
I second this.

Davide_Zagami 24 Nov 2017 21:05 UTC
2 points
0
on: Unjustified ideas comment thread
I really like the idea, but what are the limits? Can one just spit out random, speculative opinions? Can one come and just unironically state “I think Trump being president is evidence that the aliens are among us” as long as they sincerely suspect the correlation?

Davide_Zagami

Vuln­er­a­bil­ities in CDT and TI-un­aware agents

Am I un­der­stand­ing the prob­lem of fully up­dated defer­ence cor­rectly?

Vulnerabilities in CDT and TI-unaware agents

Am I understanding the problem of fully updated deference correctly?