NicholasKross

Karma: 1,951

Theoretical AI alignment (and relevant upskilling) in my free time. My current view of the field is here (part 1) and here (part 2).

/nickai/

NicholasKross 14 Mar 2023 23:38 UTC
39 points
40
on: Shutting Down the Lightcone Offices
Extremely strong upvote for Oliver’s 2nd message.

Also, not as related: kudos for actually materially changing the course of your organization, something which is hard for most organizations, period.

NicholasKross 28 Dec 2021 23:22 UTC
30 points
on: A non-magical explanation of Jeffrey Epstein
This is pretty good, although you ought to actually link more of the citations for specific facts. Court documents, testimonies, company docs, even Wikipedia would’ve been helpful for the part about IAG.

NicholasKross 26 May 2022 22:28 UTC
25 points
on: Right now, you’re sitting on a REDONKULOUS opportunity to help solve AGI (and rake in $$$)
(Semi-dumb LW category suggestion: Posts That Could Have Made You Good Money In Hindsight)

NicholasKross 21 Nov 2023 0:40 UTC
22 points
0
in reply to: Ben Pace’s comment on: Vote on worthwhile OpenAI topics to discuss
OpenAI should have an internal-only directory allowing employees and leadership to write up and see each other’s beliefs about AI extinction risk and alignment approaches.

NicholasKross 15 Apr 2023 22:17 UTC
20 points
3
on: Apply to >30 AI safety funders in one application with the Nonlinear Network
FWIW: I volunteered for Nonlinear in summer 2021, and the people behind it are pretty on-top-of-things!

NicholasKross 29 Dec 2021 7:21 UTC
18 points
in reply to: lc’s comment on: A non-magical explanation of Jeffrey Epstein
You can do footnotes, like this^[1].
1. ↩︎
  You do it by having a [^1] inline in the text, then a corresponding [^1]: dsfsdf as the footnote

NicholasKross 15 Apr 2021 1:02 UTC
17 points
in reply to: Richard_Ngo’s comment on: Against “Context-Free Integrity”
I’ve noticed a thing happening (more? lately? just in my reading sample?) similar to what you describe, where the emphasis goes more onto the social/community side of rationality as opposed to… the rest of rationality.

The Moral Mazes examples are related to that. Also topics like reputation, and virtues ‘n’ norms, and what other people think of you.

At some point, a person’s energy and resources are finite. They can try to win at anything, but maybe the lesson from recent writings is “winning at social anything is hard enough (for a LW-frequenting personality) to be a notable problem”.

Some thoughts on this issue:
- Codify, codify, codify. Most people in the LW community are lacking in some social skills (relative to both non-members and the professional-politician standard). Those who have those skills: please make long detailed checklists and email-extensions of what works. That way, the less-socially-skilled among us can avoid losing-at-social without turning into Mad-Eye Moody and losing our energy.
- Is there a trend where communities beat around the bush more over time? Many posts do what I’ve heard called “subtweeting”. “Imagine a person X, having to do thing Y, and problem Z happens...”. Yes, social game theory exists and reputation exists, but at least consider just telling people the details.
Common/game-theory/vague/bad: “Let’s say somebody goes to $ORG, but they do something bad. We should consider $ORG and everyone there to be infected with The Stinky.”

Better/precise/detailed/good: “Hey, Nicholas Kross went to MIRI and schemed to build a robot that outputs anti-utils. How do we prevent this in the future, and can we make a preventative checklist?” ^[1]

If you are totally financially/legally dependent on an abusive organization or person, obviously writing a call-out post with details is game-theoretically bad for you. In that case, don’t leave in those details. For everyone else: either write a postmortem or say “I’m under NDA, but...”.
- We get it, we need Slack, and society doesn’t give enough of it for our purposes. Can somebody with higher dopamine coordinate or promote any method so we can setup living arrangements to escape mainstream social pressures?
(If your AGI-will-give-us-Slack timeline is shorter than a community-Slack-project, how much should you really worry about long-term politics-style social/reputational-game-theoretic threats to the community’s Slack?)

Interested in more thoughts on this.
1. ↩︎
  This is a fictional example. Plus, it’s not even slyly alluding to any situations! (Well, as far as I know.)

NicholasKross 17 Jun 2022 22:59 UTC
15 points
on: I applied for a MIRI job in 2020. Here’s what happened.
This makes me wonder if operations really is the big bottleneck with EA/AIS/MIRI in particular. Anyone good at operations, would not have let this situation happen. Either it would’ve ended quicker (“Sorry, we’re not hiring right now, try again later.”) or ended differently (“You’re hired”).

NicholasKross 14 Jun 2022 23:28 UTC
15 points
on: Yes, AI research will be substantially curtailed if a lab causes a major disaster
Good catch on the natural-vs-man-made accidental bait-and-switch in the common argument. This post changed my mind to think that, at least for scaling-heavy AI (and, uh, any disaster that leaves the government standing), regulation could totally help the overall situation.

NicholasKross 12 Jul 2023 23:45 UTC
14 points
14
on: Alignment Megaprojects: You’re Not Even Trying to Have Ideas
Much cheaper, though still hokey, ideas that you should have already thought of at some point:
- A “formalization office” that checks and formalizes results by alignment researchers. It should not take months for a John Wentworth result to get formalized by someone else.
- Mathopedia.
- Alignment-specific outreach at campuses/conventions with top cybersecurity people.
What links here?
- Upgrading the AI Safety Community by trevor (16 Dec 2023 15:34 UTC; 41 points)
- NicholasKross's comment on A rough and incomplete review of some of John Wentworth’s research by So8res (16 Aug 2023 0:58 UTC; 2 points)

NicholasKross 10 Mar 2024 21:08 UTC
13 points
5
on: Why I no longer identify as transhumanist
- Person tries to work on AI alignment.
- Person fails due to various factors.
- Person gives up working on AI alignment. (This is probably a good move, when it’s not your fit, as is your case.)
- Danger zone: In ways that sort-of-rationalize-around their existing decision to give up working on AI alignment, the person starts renovating their belief system around what feels helpful to their mental health. (I don’t know if people are usually doing this after having already tried standard medical-type treatments, or instead of trying those treatments.)
- Danger zone: Person announces this shift to others, in a way that’s maybe and/or implicitly prescriptive (example).
There are, depressingly, many such cases of this pattern. (Related post with more details on this pattern.)

NicholasKross 15 Dec 2022 5:45 UTC
12 points
2
on: Here’s the exit.
This seems… testable? Like, it’s kind of the opposite message of Yudkowsky’s “try harder” posts.

Have two groups work on a research problem. One is in doom mode, one is in sober mode. See which group makes more progress.

NicholasKross 21 May 2022 0:26 UTC
12 points
on: The Game of Masks
Almost upvoted for kiiiiinda describing the actual mental model. Not upvoted because:
- Still tried to be cool and edgy with the writing style, when we’ve already established that this is a dumb idea with this topic. No, I’m not moving to the woods with you to get the “real version”.
- No illustrations or diagrams, when talking about something extremely meta-level and slippery-language-filled.
- WaitButWhy does this genre better (describing psychology helpfully using very fake analogies).

NicholasKross 22 Nov 2023 18:07 UTC
11 points
0
on: Public Call for Interest in Mathematical Alignment
Don’t forget Orthogonal’s mathematical alignment research, including QACI!

NicholasKross 21 Oct 2023 22:23 UTC
11 points
14
on: AI Safety is Dropping the Ball on Clown Attacks, and Mind Control in General
Possibly relevant: Weak Men are Superweapons.

NicholasKross 25 Sep 2023 21:58 UTC
11 points
3
on: Inside Views, Impostor Syndrome, and the Great LARP
This definitely describes my experience, and gave me a bit of help in correcting course, so thank you.

Also, I recall an Aella tweet where she claimed that some mental/emotional problems might be normal reactions to having low status and/or not doing much interesting in life. Partly since, in her own experience, those problems were mostly(?) alleviated when she started “doing more awesome stuff”.

NicholasKross 30 Jun 2023 2:48 UTC
LW: 11 AF: 2
2
AF
on: A Case for the Least Forgiving Take On Alignment
Even after thinking through these issues in SERI-MATS, and already agreeing with at least most of this post, I was surprised upon reading it how many new-or-newish-to-me ideas and links it contained.

I’m not sure if that’s more of a failure of me, or of the alignment field to notice “things that are common between a diverse array of problems faced”. Kind of related to my hunch that multiple alignment concepts (“goals”, “boundaries”, “optimization”) will turn out to be isomorphic to the same tiny-handful of mathematical objects.

NicholasKross 25 Apr 2023 1:54 UTC
11 points
4
on: How to Read Papers Efficiently: Fast-then-Slow Three pass method
Now I wonder if there should be some (public? private?) thing kinda like MyAnimeList, but for papers. So you can track which ones you’ve “dropped” vs kept-reading, summarize (“review”) the takeaways/quality, etc.

NicholasKross 20 Oct 2021 21:18 UTC
11 points
on: Explaining Capitalism Harder
Eagerly awaiting the Massive LessWrong Post Explaining And Comparing Capitalism And Socialism With Lots Of Steelmanned Arguments, General Principles, And Consequentialism. It’d take a lot of work to write well, though...

NicholasKross 19 Dec 2018 3:55 UTC
11 points
in reply to: Viliam’s comment on: New edition of “Rationality: From AI to Zombies”
There is something weirdly powerful about these being in well-printed form, more approachable in form and content. This is something easier to recommend to friends without having to go “By the way this post references a thing that is no longer relevant / is poorly researched” like the Robber’s Cave mentioned.