Neil

Karma: 638

I’m bumping into walls but hey now I know what the maze looks like.

Neil 14 Jul 2024 12:39 UTC
1 point
0
in reply to: Neil ’s comment on: Introduction to French AI Policy
Fun fact: it’s thanks to Lucie that I ended up stumbling onto PauseAI in the first place. Small world + thanks Lucie.

Neil 7 Jul 2024 23:27 UTC
6 points
1
on: Introduction to French AI Policy
Update everyone: the hard right did not end up gaining a parliamentary majority, which, as Lucie mentioned, could have been the worse outcome wrt AI safety.

Looking ahead, it seems that France will end up being fairly confused and gridlocked as it becomes forced to deal with an evenly-split parliament by playing German-style coalition negociation games. Not sure what that means for AI, except that unilateral action is harder.

For reference, I’m an ex-high school student who just got to vote for the first 3 times in his life because of French political turmoil (✨exciting) and am working these days at PauseAI France, a (soon to be official) governance non-profit aiming to, well—

Anyway, as an org we’re writing a counter to the AI commitee mentioned in this post, so that’s what’s up these days in the French AI safety governance circles.

Neil 10 May 2024 18:10 UTC
1 point
0
on: Neil Warren’s Shortform
I’m working on a non-trivial.org project meant to assess the risk of genome sequences by comparing them to a public list of the most dangerous pathogens we know of. This would be used to assess the risk from both experimental results in e.g. BSL-4 labs and the output of e.g. protein folding models. The benchmarking would be carried out by an in-house ML model of ours. Two questions to LessWrong:
1. Is there any other project of this kind out there? Do BSL-4 labs/AlphaFold already have models for this?
2. “Training a model on the most dangerous pathogens in existence” sounds like an idea that could backfire horribly. Can it backfire horribly?

Neil 26 Apr 2024 15:57 UTC
1 point
0
in reply to: Viliam’s comment on: Gâchis Astronomique
I’m taking this post down, it was to set up an archive.org link as requested by Bostrom, and no longer serves that purpose. Sorry, this was meant to be discreet.

Neil 25 Apr 2024 7:56 UTC
3 points
0
on: Neil Warren’s Shortform
Poetry and practicality

I was staring up at the moon a few days ago and thought about how deeply I loved my family, and wished to one day start my own (I’m just over 18 now). It was a nice moment.

Then, I whipped out my laptop and felt constrained to get back to work; i.e. read papers for my AI governance course, write up LW posts, and trade emails with EA France. (These I believe to be my best shots at increasing everyone’s odds of survival).

It felt almost like sacrilege to wrench myself away from the moon and my wonder. Like I was ruining a moment of poetry and stillwatered peace by slamming against reality and its mundane things again.

But… The reason I wrenched myself away is directly downstream from the spirit that animated me in the first place. Whether I feel the poetry now that I felt then is irrelevant: it’s still there, and its value and truth persist. Pulling away from the moon was evidence I cared about my musings enough to act on them.

The poetic is not a separate magisterium from the practical; rather the practical is a particular facet of the poetic. Feeling “something to protect” in my bones naturally extends to acting it out. In other words, poetry doesn’t just stop. Feel no guilt in pulling away. Because, you’re not.

Neil 21 Apr 2024 12:27 UTC
5 points
4
in reply to: Chris_Leong’s comment on: “You’re the most beautiful girl in the world” and Wittgensteinian Language Games
Too obvious imo, though I didn’t downnvote. This also might not be an actual rationalist failure mode; in my experience at least, rationalists have about the same intuition all the other humans have about when something should be taken literally or not.

As for why the comment section has gone berserk, no idea, but it’s hilarious and we can all use some fun.

Neil 20 Apr 2024 23:06 UTC
2 points
−13
on: Neil Warren’s Shortform
Can we have a black banner for the FHI? Not a person, still seems appropriate imo.

Neil 20 Apr 2024 19:52 UTC
4 points
0
on: “You’re the most beautiful girl in the world” and Wittgensteinian Language Games
See also Alicorn’s Expressive Vocabulary.

Neil 17 Apr 2024 22:55 UTC
8 points
6
on: Neil Warren’s Shortform
FHI at Oxford
by Nick Bostrom (recently turned into song):
the big creaky wheel
a thousand years to turn
thousand meetings, thousand emails, thousand rules
to keep things from changing
and heaven forbid
the setting of a precedent
yet in this magisterial inefficiency
there are spaces and hiding places
for fragile weeds to bloom
and maybe bear some singular fruit
like the FHI, a misfit prodigy
daytime a tweedy don
at dark a superhero
flying off into the night
cape a-fluttering
to intercept villains and stop catastrophes
and why not base it here?
our spandex costumes
blend in with the scholarly gowns
our unusual proclivities
are shielded from ridicule
where mortar boards are still in vogue

Neil 13 Apr 2024 18:55 UTC
1 point
0
in reply to: tailcalled’s comment on: Consequentialism is a compass, not a judge
I’ve come to think that isn’t actually the case. E.g. while I disagree with Being nicer than clippy, it quite precisely nails how consequentialism isn’t essentially flawless:
I haven’t read that post, but I broadly agree with the excerpt. On green did a good job imo in showing how weirdly imprecise optimal human values are.
It’s true that when you stare at something with enough focus, it often loses that bit of “sacredness” which I attribute to green. As in, you might zoom in enough on the human emotion of love and discover that it’s just an endless tiling of Shrodinger’s equation.
If we discover one day that “human values” are eg 23.6% love, 15.21% adventure and 3% embezzling funds for yachts, and decide to tile the universe in exactly those proportions...^[1] I don’t know, my gut doesn’t like it. Somehow, breaking it all into numbers turned humans into sock puppets reflecting the 23.6% like mindless drones.
The target “human values” seems to be incredibly small, which I guess encapsulates the entire alignment problem. So I can see how you could easily build an intuition from this along the lines of “optimizing maximally for any particular thing always goes horribly wrong”. But I’m not sure that’s correct or useful. Human values are clearly complicated, but so long as we haven’t hit a wall in deciphering them, I wouldn’t put my hands up in the air and act as if they’re indecipherable.
Unbounded utility maximization aspires to optimize the entire world. This is pretty funky for just about any optimization criterion people can come up with, even if people are perfectly flawless in how well they follow it. There’s a bunch of attempts to patch this, but none have really worked so far, and it doesn’t seem like any will ever work.
I’m going to read your post and see the alternative you suggest.
1. ^
  Sounds like a Douglas Adams plot

Neil 13 Apr 2024 13:40 UTC
0 points
−2
in reply to: sweenesm’s comment on: Consequentialism is a compass, not a judge
Interesting! Seems like you put a lot of effort into that 9,000-word post. May I suggest you publish it in little chunks instead of one giant post? You only got 3 karma for it, so I assume that those who started reading it didn’t find it worth the effort to read the whole thing. The problem is, that’s not useful feedback for you, because you don’t know which of those 9,000 words are presumably wrong. If I were building a version of utilitarianism, I would publish it in little bursts of 2-minute posts. You could do that right now with a single section of your original post. Clearly you have tons of ideas. Good luck!

Neil 13 Apr 2024 13:30 UTC
2 points
1
in reply to: tailcalled’s comment on: Consequentialism is a compass, not a judge
You know, I considered “Bob embezzled the funds to buy malaria nets” because I KNEW someone in the comments would complain about the orphanage. Please don’t change.
Actually, the orphanage being a cached thought is precisely why I used it. The writer-pov lesson that comes with “don’t fight the hypothetical” is “don’t make your hypothetical needlessly distracting”. But maybe I miscalculated and malaria nets would be less distracting to LWers.
Anyway, I’m of course not endorsing fund-embezzling, and I think Bob is stupid. You’re right in that failure modes associated with Bob’s ambitions (eg human extinction) might be a lot worse than those of your typical fund-embezzler (eg the opportunity cost of buying yachts). I imagined Bob as being kind-hearted and stupid, but in your mind he might be some cold-blooded brooding “the price must be paid” type consequentialist. I didn’t give details either way, so that’s fair.
If you go around saying “the ends justify the means” you’re likely to make major mistakes, just like if you walk around saying “lying is okay sometimes”. The true lesson here is “don’t trust your own calculations, so don’t try being clever and blowing up TSMC”, not “consequentialism has inherent failure modes”. The ideal of consequentialism is essentially flawless; it’s when you hand it to sex-obsessed murder monkeys as an excuse to do things that shit hits the fan.
In my mind then, Bob was a good guy running on flawed hardware. Eliezer calls patching your consequentialism by making it bounded “consequentialism, one meta-level up”. For him, refusing to embezzle funds for a good cause because the plan could obviously turn sour is just another form of consequentialism. It’s like belief in intelligence, but flipped; you don’t know exactly how it’ll go wrong, but there’s a good chance you’re unfathomably stupid and you’ll make everything worse by acting on “the ends justify the means”.
From a practical standpoint though, we both agree and nothing changes: both the cold-hearted Bob and the kind Bob must be stopped. (And both are indeed more likely to make ethically dubious decisions because “the ends justify the means”.)
Post-scriptum:
Honestly the one who embezzles funds for unbounded consequentialist purposes sounds much more intellectually interesting
Yeah, this kind of story makes for good movies. When I wrote Bob I was thinking of The Wonderful Story of Mr.Sugar, by Roald Dahl and adapted by Wes Anderson on Netflix. It’s at least vaguely EA-spirited, and is kind of in that line (although the story is wholesome, as the name indicates, and isn’t meant to warn against dangers associated with boundless consequentialism at all).^[1]
1. ^
  Let’s wait for the SBF movie on that one

Consequentialism is a compass, not a judge

Neil 13 Apr 2024 10:47 UTC

26 points

6 comments2 min readLW link

Neil 12 Apr 2024 9:31 UTC
3 points
0
in reply to: faul_sname’s comment on: Martín Soto’s Shortform
Link is broken

Neil 8 Apr 2024 15:34 UTC
3 points
0
in reply to: TeaTieAndHat’s comment on: Politics are not serious by default
Re: sociology. I found a meme you might enjoy, which would certainly drive your teacher through the roof: https://twitter.com/captgouda24/status/1777013044976980114

Privacy and writing

Neil 6 Apr 2024 8:20 UTC

20 points

1 comment5 min readLW link

Neil 6 Apr 2024 7:40 UTC
1 point
0
in reply to: papetoast’s comment on: Neil Warren’s Shortform
Yeah, that’s an excellent idea. I often spot typos in posts, but refrain from writing a comment unless I collect like three. Thanks for sharing!

Neil 2 Apr 2024 22:26 UTC
62 points
54
on: Neil Warren’s Shortform
A functionality I’d like to see on LessWrong: the ability to give quick feedback for a post in the same way you can react to comments (click for image). When you strong-upvote or strong-downvote a post, a little popup menu appears offering you some basic feedback options. The feedback is private and can only be seen by the author.
I’ve often found myself drowning in downvotes or upvotes without knowing why. Karma is a one-dimensional measure, and writing public comments is a trivial inconvience: this is an attempt at middle ground, and I expect it to make post reception clearer.
See below my crude diagrams.

Neil 2 Apr 2024 18:04 UTC
1 point
0
in reply to: NickH’s comment on: NickH’s Shortform
I’m not clear on what you’re calling the “problem of superhuman AI”?

Neil 1 Apr 2024 17:25 UTC
1 point
0
on: There’s an AGI on LessWrong
I was given clear instructions from a math phd about how to dump random lean files into the repository I created to confuse lesswrongers for at least a few minutes. But then I got confused while attempting to follow the instructions. There’s only so much my circuits can handle. I’m running most of my code on a Chromebook! Fear me.

Neil

Con­se­quen­tial­ism is a com­pass, not a judge

Pri­vacy and writing

Consequentialism is a compass, not a judge

Privacy and writing