Quinn

Karma: 959

https://quinnd.net

Speedrunning 4 mistakes you make when your alignment strategy is based on formal proof

Quinn16 Feb 2023 1:13 UTC

62 points

18 comments2 min readLW link

Takeaways from the Intelligence Rising RPG

Quinn and Viktor Rehnberg

5 Mar 2021 10:27 UTC

51 points

7 comments12 min readLW link

Announcing the Technical AI Safety Podcast

Quinn7 Dec 2020 18:51 UTC

42 points

6 comments2 min readLW link

(technical-ai-safety.libsyn.com)

Quinn 25 Feb 2023 1:47 UTC
41 points
25
on: Retrospective on the 2022 Conjecture AI Discussions
Ideally there would be an exceedingly high bar for strategic witholding of worldviews. I’d love some mechanism for sending downvotes to the orgs that veto’d their staff from participating! I’d love some way of socially pressuring these orgs into at least trying to convince us that they had really good reasons.

I’m pretty cynical: I assume nervous and uncalibrated shuffling of HR or legal counsel is more likely than actual defense against hazardous leakage of, say, capabilities hints.

Quinn 6 Oct 2023 16:11 UTC
35 points
−1
in reply to: KurtB’s comment on: Related Discussion from Thomas Kwa’s MIRI Research Experience
This comment’s updates for me personally:
- The overall “EA is scary / criticizing leaders is scary” meme is very frequently something I roll my eyes at, I find it alien and sometimes laughable when people say they’re worried about being bold and brave cuz all I ever see are people being rewarded for constructive criticism. But man, I feel like if I didn’t know about some of this stuff then I’m missing a huge piece of the puzzle. Unclear yet what I’ll think about, say, the anon meta on forums after this comment sinks in / propagates, but my guess is it’ll be very different than what I thought before.
- People are way too quick to reward themselves for trying (my update is my priority queue in doing a proper writeup): Nate & enablers saying that productivity / irreplaceability is an excuse to triage out fundamental interpersonal effort is equivalent (as far as I’m concerned) to a 2022 University Community Builder (TM) deciding that they’re entitled to opulent retreats the moment they declare stated interest in saving the world. “For the greater good” thinking is fraught and dicey even when you’re definitely valuable enough for the case to genuinely be made, but obviously there’s pressure toward accepting a huge error rate if you simply want to believe you or a colleague is that productive/insightful. I honestly think Nate’s position here is more excusable than enablers: you basically need to see nobel physicist level output before you consider giving someone this much benefit of the doubt, and even then you should decide not to after considering it, I’m kinda dumbfounded that it was this easy for MIRI’s culture to be like this. (yes my epistemic position is going to be wrong about the stakes because “undisclosed by default”, but there are a bajillion sources of my roll to disbelieve if anyone says “well actually undisclosed MIRI codebases are nobel physicist level).
- I feel very vindicated having written this comment, and I am subtracting karma from everyone who gave Nate points for writing a long introspective gdoc. You guys should’ve assumed that it would be a steep misfire.
- Someone told me that some friends of theirs hated a talk or office hours with Nate, and I super devil’s advocated the idea that lots of people have reasons for disliking the “blunt because if I suffer fools we’ll all lower our standards” style that I’m not sympathetic with: I now need to apologize to them for being dismissive. I mean for chrissakes yall, in my first 1:1 with Eliezer he was not suffering fools, he helped me speedrun noticing how misled my optimism about my project at the time was and it was jovial and pleasant, so I felt like an idiot and I look back fondly on the interaction. So no, the comments about how comms style is downstream of trying to outperform those prestigious etiquette professional academics goodharting on useless but legible research that Nate retreats to elsewhere in the comments here do not hold.
Extremely from the heart warm comments about Nate from my PoV (not coming from a phonedin/trite/etiquette “soften the blow” place, but very glad that there’s that upside):
- I’m a huge replacing guilt fan
- reading Nate on github and lesswrong has been very important to me in my CS education. The old intelligence.org/research-guide mattered so much to me at very important life/development pivot.
- Nate’s strategy / philosophy of alignment posts, particularly recently, have been phenomenal.
- in a sibling comment, Nate wrote:
If you stay and try to express yourself despite experiencing strong feelings of frustration, you’re “almost yelling”. If you leave because you’re feeling a bunch of frustration and people say they don’t like talking to you while you’re feeling a bunch of frustration, you’re “storming out”.

This is hard and unfair and I absolutely feel for him, I’ve been there^[1].
- I don’t know if we’ve ever been in the same room. I’m going off of web presence, and very little comments or rumors others have said.
1. ↩︎
  on second thought: I’ve mostly only been there in say a soup kitchen run by trans commie lesbians, who are eagerly looking for the first excuse they can find to cancel the cis guy. I guess I don’t at all relate to the possibility that someone would feel that way in bay area tech scene.

Quinn 28 Nov 2022 17:49 UTC
34 points
11
in reply to: Dirichlet-to-Neumann’s comment on: On the Diplomacy AI
For me the scary part was Meta’s willingness to do things that are minimally/arguably torment-nexusy and then put it in PR language like “cooperation” and actually with a straight face sweep the deceptive capability under the rug.

This is different from believing that the deceptive capability in question is on it’s own dangerous or surprising.

My update from cicero is almost entirely on the social reality level: I now more strongly than before believe that in the social reality, rationalization for torment-nexus-ing will be extremely viable and accessible to careless actors.

(that said, I think I may have forecasted 30-45% chance of full-press diplomacy success if you had asked me a few weeks ago, so maybe I’m not that unsurprised on the technical level)

Quinn 31 Dec 2021 21:41 UTC
34 points
on: This Year I Tried To Teach Myself Math. How Did It Go?
Cheers, thanks for writing. I was very anti-math high school student, almost got expelled for throwing a temper tantrum at my algebra 2 teacher cuz I thought it wasn’t fair they were making me sit through it. That was 10th grade and they didn’t make me take any other math courses. 7 or 8 years later I took the placement exam at a community college, placed into precalc I, retreated to khanacademy and retook the exam a few months later placing into calc I, took that and discrete and all of their sequels, ended up getting straight As and tutoring every single math course there. I feel like I owe a lot to khanacademy, a tailored user experience really adds a ton of value over throwing myself at textbooks (and I did eventually figure out how to throw myself at textbooks, but also failed at doing so many times).

The purpose of my comment is to register for anyone intimidated by comments they’ve seen that imply people in the movement were doing math at such n such level in grade school that we’re out here, we exist, and we’re doing stuff; we who had to put effort into precalc in our 20s.

Abundance and scarcity; working forwards and working backwards

Quinn18 Feb 2022 19:05 UTC

28 points

4 comments8 min readLW link

Chance that “AI safety basically [doesn’t need] to be solved, we’ll just solve it by default unless we’re completely completely careless”

Quinn, Aidan_Kierans, Morpheus and Nicholas Turner

8 Dec 2020 21:08 UTC

27 points

0 comments5 min readLW link

TASP Ep 3 - Optimal Policies Tend to Seek Power

Quinn11 Mar 2021 1:44 UTC

24 points

0 comments1 min readLW link

(technical-ai-safety.libsyn.com)

Do the best ideas float to the top?

Quinn21 Jan 2019 5:22 UTC

22 points

11 comments6 min readLW link

Riffing on the agent type

Quinn8 Dec 2022 0:19 UTC

21 points

3 comments4 min readLW link

[Question] What are some claims or opinions about multi-multi delegation you’ve seen in the memeplex that you think deserve scrutiny?

Quinn27 Jun 2021 17:44 UTC

17 points

6 comments2 min readLW link

Quinn 22 Feb 2023 19:45 UTC
17 points
4
on: Building and Entertaining Couples
To be fair, me and someone tried a “select on building properties, try to cultivate entertaining properties later” strategy.

We read eachothers’ dating docs, calculated an optimism-inducing amount of goal alignment and compatibility, and took a crack at being charming and funny and sexy to eachother.

It did not go as planned. I was a little shocked—surely my monkeybrain needs would cooperate with (or be coerced into aligning with) my actual life goals, right? With hindsight I’m kind of honing my ability to recognize “just reverse engineer the ‘spark’, how hard can it be” as a special kind of stupid.

[Question] Could degoogling be a practice run for something more important?

Quinn17 Apr 2021 0:03 UTC

15 points

5 comments1 min readLW link

Quinn 3 Oct 2023 17:30 UTC
15 points
3
on: Linkpost: They Studied Dishonesty. Was Their Work a Lie?
I heard a pretty haunting take about how long it took to discover steroids in bike races. Apparently, there was a while where a “few bad apples” narrative remained popular even when an ostensibly “one of the good ones” guy was outperforming guys discovered to be using steroids.

I’m not sure how dire or cynical we should be about academic knowledge or incentives. I think it’s more or less defensible to assume that no one with a successful career is doing anything real until proven otherwise, but it’s still a very extreme view that I’d personally bet against. Of course also things vary so much field by field.
What links here?
- How much fraud is there in academia? by ChristianKl (16 Nov 2023 11:50 UTC; 23 points)

Quinn 3 Feb 2023 13:52 UTC
15 points
1
in reply to: Jiro’s comment on: What fact that you know is true but most people aren’t ready to accept it?
how do we know it’s false?

What is estimational programming? Squiggle in context

Quinn12 Aug 2022 18:39 UTC

14 points

7 comments7 min readLW link

Quinn 11 Oct 2023 17:51 UTC
14 points
2
in reply to: Oliver Sourbut’s comment on: Related Discussion from Thomas Kwa’s MIRI Research Experience
https://www.lesswrong.com/posts/BGLu3iCGjjcSaeeBG/related-discussion-from-thomas-kwa-s-miri-research?commentId=fPz6jxjybp4Zmn2CK This brief subthread can be read as “giving nate points for trying” and is too credulous about if “introspection” actually works—my wild background guess is that roughly 60% of the time “introspection” is more “elaborate self-delusion” than working as intended, and there are times when someone saying “no but I’m trying really hard to be good at it” drives that probability up instead of down. I didn’t think this was one of those times before reading Kurt’s comment. A more charitable view is that this prickliness (understatement) is something that’s getting triage’d out / deprioritized, not gymnastically dodged, but I think it’s unreasonable to ask people to pay attention to the difference.

That’s besides the point: the “it” was just the gdoc. “it would be a steep misfire” would mean “the gdoc tries to talk about the situation and totally does not address what matters”. The subtraction of karma was metaphorical (I don’t think I even officially voted on lesswrong!). I want to emphasize that I’m still very weak, cuz for instance I can expect people in that subthread to later tell me a detailed inside view about how giving Nate points for trying (by writing that doc) doesn’t literally mean that they were drawn into this “if von neumann has to scream at me to be productive, then it would be selfish to set a personal boundary” take, but I think it’s reasonable for me to be suspicious and cautious and look for more evidence that people would not fall for this class of “holding some people to different standards for for-the-greater-good reasons” again.

Quinn 5 Oct 2023 3:34 UTC
14 points
5
in reply to: TurnTrout’s comment on: Related Discussion from Thomas Kwa’s MIRI Research Experience
Ah yeah. I’m a bit of a believer in “introspection preys upon those smart enough to think they can do it well but not smart enough to know they’ll be bad at it”^[1], at least to a partial degree. So it wouldn’t shock me if a long document wouldn’t capture what matters.
1. ↩︎
  epistemic status: in that sweet spot myself