X4vier

Karma: 134

X4vier 7 Nov 2017 7:48 UTC
6 points
on: Moloch’s Toolbox (2/2)
Heartbreaking :’( still, that “taken time off from their cryptographic shenanigans” line made me laugh so hard I woke my girlfriend up

X4vier 17 Nov 2017 13:25 UTC
19 points
on: Against Shooting Yourself in the Foot
So much of your writing sounds like an eloquent clarification of my own underdeveloped thoughts. I’d bet good money your lesswrong contributions have delivered me far more help than harm :) Thanks <3

X4vier 18 Mar 2018 0:54 UTC
0 points
on: AI Alignment Prize: Super-Boxing

X4vier 18 Mar 2018 1:45 UTC
1 point
on: Announcement: AI alignment prize winners and next round
My entry: https://www.lesserwrong.com/posts/DTv3jpro99KwdkHRE/ai-alignment-prize-super-boxing

X4vier 6 Apr 2018 3:54 UTC
1 point
in reply to: cousin_it’s comment on: AI Alignment Prize: Super-Boxing
Sorry for the late response! I didn’t realise I had comments :)
In this proposal we go with (2): The AI does whatever it thinks the handlers will reward it for.
I agree this isn’t as good as giving the agents an actually safe reward function, but if our assumptions are satisfied then this approval-maximising behaviour might still result in the human designers getting what they actually want.
What I think you’re saying (please correct me if I misunderstood) is that an agent aiming to do whatever its designers reward it for will be incentivised to do undesirable things to us (like wiring up our brains to machines which make us want to press the reward button all the time).
It’s true that the agents will try to take these kind nefarious actions if they think they can get away with it. But in this setup the agent knows that it can’t get away with tricking the humans like this, since it’s ancestors already warned the humans that a future agent might try this, and the humans prepared appropriately.

X4vier 6 Apr 2018 4:05 UTC
1 point
in reply to: Charlie Steiner’s comment on: AI Alignment Prize: Super-Boxing
Thanks for your comment, I think I’m a little confused about what it would mean to actually satisfy this assumption.
It seems to me that many current algorithms, for example, a rainbowDQN agent, would satisfy assumption 3? But like I said I’m super confused about anything resembling questions about self-awareness/naturalisation.

X4vier 15 Jun 2018 9:35 UTC
6 points
in reply to: paulfchristiano’s comment on: Weak arguments against the universal prior being malign
Thanks for response!
Input/output: I agree that the unnatural input/output channel is just as much a problem for the ‘intended’ model as for the models harbouring consequentialists, but I understood your original argument as relying on there being a strong asymmetry where the models containing consequentialists aren’t substantially penalised by the unnaturalness of their input/output channels. An asymmetry like this seems necessary because specifying the input channel accounts for pretty much all of the complexity in the intended model.
Computational constraints: I’m not convinced that the necessary calculations the consequentialists would have to make aren’t very expensive (from the their point of view). They don’t merely need to predict the continuation of our bit sequence—they have to run simulations of all kinds of possible universes to work out which ones they care about and where in the multiverse Solomonoff inductors are being used to make momentous decisions, and then they perhaps need to simulate their own universe to work out which plausible input/output channels they want to target—if they do this then all they get in return is a pretty measly influence over our beliefs, (since they’re competing with many other daemons in approximately equally similar universes who have opposing values). I think there’s a good chance these consequentialists might instead elect devote their computational resources to realising other things they desire (like simulating happy copies of themselves or something).

X4vier 18 Jun 2018 10:50 UTC
2 points
in reply to: paulfchristiano’s comment on: Weak arguments against the universal prior being malign
Okay, I agree. Thanks :)

X4vier 3 Apr 2019 0:21 UTC
4 points
on: “Other people are wrong” vs “I am right”
Thanks heaps for the post man, I really enjoyed it! While I was reading it felt like you were taking a bunch of half-baked vague ideas out of my own head, cleaning them up, and giving some much clearer more-developed versions of those ideas back to me :)

X4vier 24 Feb 2023 4:59 UTC
3 points
4
in reply to: Jackson Wagner’s comment on: Big Mac Subsidy?
Doesn’t make sense to use the particular consumer’s preferencces to estimate the cruelty cost. If that’s how we define the cruelty cost it then the buyer should already be taking it into account when making their purchasing decision, so it’s not an exernality.

The externality comes from the animals themselves having interests which the consumers aren’t considering

X4vier 1 Mar 2023 4:11 UTC
4 points
3
in reply to: mukashi’s comment on: Transcript: Yudkowsky on Bankless follow-up Q&A
If we expect there will be lots of intermediate steps—does this really change the analysis much?

How will we know once we’ve reached the point where there aren’t many intermediate steps left before crossing a crticial threshold? How do you expect everyone’s behaviour to change once we do get close?

X4vier 13 Apr 2023 0:05 UTC
19 points
12
in reply to: Jan_Kulveit’s comment on: Evolution provides no evidence for the sharp left turn
I think OP is correct about cultural learning being the most important factor in explaining the large difference in intelligence between homo sapiens and other animals.

In early chapters of Secrets of Our Success, the book examines studies comparing performance of young humans and young chimps on various congnitive tasks. The book argues that across a broad array of cognitive tests, 4 year old humans do not perform singificantly better than 4 year old chimps on average, except in cases where the task can be solved by immitating others (human children crushed the chimps when this was the case).

The book makes a very compelling argument that our species is uniquely prone to immitating others (even in the absense of causal models about why the behaviour we’re immitating is useful), and even very young humnans have inate instincts for picking up on signals of prestige/compotence in others and preferentially immitating those high prestige poeple. Imo the arguments put forward in this book make cultral learning look like a very strong theory better in comparison to Machieavellian intelligence hypothesis, (although what actually happend at a lower level abstraction probably includes aspects of both).
What links here?
- Arguments for optimism on AI Alignment (I don’t endorse this version, will reupload a new version soon.) by Noosphere89 (15 Oct 2023 14:51 UTC; 23 points)

X4vier 24 May 2023 0:07 UTC
1 point
0
in reply to: James Payor’s comment on: [Linkpost] “Governance of superintelligence” by OpenAI
Out of interest—if you had total control over OpenAI—what would you want them to do?

X4vier 4 Jul 2023 0:07 UTC
1 point
0
in reply to: Herb Ingram’s comment on: When do “brains beat brawn” in Chess? An experiment
Maybe an analogy which seems closer to the “real world” situation—let’s say you and someone like Sam Altman both tried to start new companies. How much more time and starting capital do you think you’d need to have a better shot of success than him?