Zach Stein-Perlman(Zachary Stein-Perlman)

Karma: 3,147

AI strategy & governance. ailabwatch.org. Looking for new projects.

Zach Stein-Perlman 15 Jul 2022 15:32 UTC
111 points
76
on: Don’t use ‘infohazard’ for collectively destructive info
We already have a Schelling point for “infohazard”: Bostrom’s paper. Redefining “infohazard” now is needlessly confusing. (And most of the time I hear “infohazard” it’s in the collectively-destructive smallpox-y sense, and as Buck notes this is more important and common.)

Zach Stein-Perlman 7 Sep 2023 18:53 UTC
74 points
54
in reply to: Ben Pace’s comment on: Sharing Information About Nonlinear
Kat, Emerson, and Drew’s reputation is not your concern insofar you’re basically certain that your post is basically true. If you thought there was a decent chance that your post was basically wrong and Nonlinear would find proof in the next week, publishing now would be inappropriate.
When destroying someone’s reputation you have an extra obligation to make sure what you’re saying is true. I think you did that in this case—just clarifying norms.

Zach Stein-Perlman 18 Nov 2023 0:46 UTC
68 points
11
on: Sam Altman fired from OpenAI
Update: Greg Brockman quit.
Update: Sam and Greg say:
Sam and I are shocked and saddened by what the board did today.
Let us first say thank you to all the incredible people who we have worked with at OpenAI, our customers, our investors, and all of those who have been reaching out.
We too are still trying to figure out exactly what happened. Here is what we know:
- Last night, Sam got a text from Ilya asking to talk at noon Friday. Sam joined a Google Meet and the whole board, except Greg, was there. Ilya told Sam he was being fired and that the news was going out very soon.
- At 12:19pm, Greg got a text from Ilya asking for a quick call. At 12:23pm, Ilya sent a Google Meet link. Greg was told that he was being removed from the board (but was vital to the company and would retain his role) and that Sam had been fired. Around the same time, OpenAI published a blog post.
- As far as we know, the management team was made aware of this shortly after, other than Mira who found out the night prior.
The outpouring of support has been really nice; thank you, but please don’t spend any time being concerned. We will be fine. Greater things coming soon.
Update: three more resignations including Jakub Pachocki.
Update:
Sam Altman’s firing as OpenAI CEO was not the result of “malfeasance or anything related to our financial, business, safety, or security/privacy practices” but rather a “breakdown in communications between Sam Altman and the board,” per an internal memo from chief operating officer Brad Lightcap seen by Axios.
Update: Sam is planning to launch something (no details yet).
Update: Sam may return as OpenAI CEO.
Update: Tigris.
Update: talks with Sam and the board.
Update: Mira wants to hire Sam and Greg in some capacity; board still looking for a permanent CEO.
Update: Emmett Shear is interim CEO; Sam won’t return.
Update: lots more resignations (according to an insider).
Update: Sam and Greg leading a new lab in Microsoft.
Update: total chaos.

Zach Stein-Perlman 9 Sep 2023 17:35 UTC
62 points
20
in reply to: Emerson Spartz’s comment on: Sharing Information About Nonlinear
Ben has also been quietly fixing errors in the post, which I appreciate, but people are going around right now attacking us for things that Ben got wrong, because how would they know he quietly changed the post?
This is why every time newspapers get caught making a mistake they issue a public retraction the next day to let everyone know. I believe Ben should make these retractions more visible.
I used a diff checker to find the differences between the current post and the original post. There seem to be two:
1. “Alice worked there from November 2021 to June 2022” became “Alice travelled with Nonlinear from November 2021 to June 2022 and started working for the org from around February”
2. “using Lightcone funds” became “using personal funds”
Possibly I made a mistake, or Ben made edits and you saw them and then Ben reverted them—if so, I encourage you/anyone to point to another specific edit, possibly on other archive.org versions.
Update: Kat guesses she was thinking of changes from a near-final draft rather than changes from the first published version.

Zach Stein-Perlman 21 Feb 2023 1:31 UTC
53 points
33
on: AI alignment researchers don’t (seem to) stack
I largely agree. But I think not-stacking is only slightly bad because I think the “crappy toy model [where] every alignment-visionary’s vision would ultimately succeed, but only after 30 years of study along their particular path” is importantly wrong; I think many new visions have a decent chance of succeeding more quickly and if we pursue enough different visions we get a good chance of at least one paying off quickly.
Edit: even if alignment researchers could stack into just a couple paths, I think we might well still choose to go wide.

Zach Stein-Perlman 3 Jan 2022 14:30 UTC
46 points
NIL
on: Open Thread—Jan 2022 [Vote Experiment!]
Please tell us what you think! Love it/hate it/think it should be different? Let us know.

I think it’s a fine experiment but… right now I’m closest to “hate it,” at least if it was used for all posts (I’d be much happier if it was only for question-posts, or only if the author requested it or a moderator thought it would be particularly useful, or something).
- It makes voting take longer (with not much value added).
- It makes reading comments take longer (with not much value added). You learn very little from these votes beyond what you learn from reading the comment.
- It’s liable to make the more OCD among us go crazy. Worrying about how other people vote on your writing is bad enough. I, for one, would write worse comments in expectation if I was always thinking about making everyone else believe that my comments were true and well-aimed and clear and truth-seeking &c.
If this system was implemented in general, I would almost always prefer not to interact with it, so I would strongly request a setting to hide all non-karma voting from my view.

Edit in response to Rafael: for me at least the downside isn’t anxiety but mental effort to optimize for comment quality rather than votes and mental effort to ignore votes on my own comments. I’m not sure if the distinction matters; regardless, I’d be satisfied with the ability to hide non-karma votes.

Zach Stein-Perlman 18 Apr 2024 6:00 UTC
42 points
29
in reply to: jimrandomh’s comment on: FHI (Future of Humanity Institute) has shut down (2005–2024)
Harry let himself be pulled, but as Hermione dragged him away, he said, raising his voice even louder, “It is entirely possible that in a thousand years, the fact that FHI was at Oxford will be the only reason anyone remembers Oxford!”

Zach Stein-Perlman 17 Nov 2023 21:02 UTC
42 points
3
in reply to: Max H’s comment on: Sam Altman fired from OpenAI
Has anyone collected their public statements on various AI x-risk topics anywhere?
A bit, not shareable.
Helen is an AI safety person. Tasha is on the Effective Ventures board. Ilya leads superalignment. Adam signed the CAIS statement.

Zach Stein-Perlman 23 Dec 2022 9:24 UTC
39 points
5
in reply to: habryka’s comment on: Let’s think about slowing down AI
I ended up doing some quick google searches for AI opinion polls
I collected such polls here, if you want to see more. Most people say they want to regulate AI.
What links here?
- Lizka's comment on Let’s think about slowing down AI by Katja_Grace (EA Forum; 26 Dec 2022 3:19 UTC; 35 points)

Zach Stein-Perlman 21 Nov 2021 14:30 UTC
37 points
in reply to: LVSN’s comment on: Split and Commit
Or! This idea sounds superficially reasonable and even (per the appendix) gets praise from a few people, but is actually useless or harmful. Currently working out a hypothesis for how that could be the case...

Zach Stein-Perlman 30 Oct 2023 23:53 UTC
35 points
1
on: President Biden Issues Executive Order on Safe, Secure, and Trustworthy Artificial Intelligence
This was the press release; the actual order has now been published.
One safety-relevant part:
4.2. Ensuring Safe and Reliable AI. (a) Within 90 days of the date of this order, to ensure and verify the continuous availability of safe, reliable, and effective AI in accordance with the Defense Production Act, as amended, 50 U.S.C. 4501 et seq., including for the national defense and the protection of critical infrastructure, the Secretary of Commerce shall require:
(i) Companies developing or demonstrating an intent to develop potential dual-use foundation models to provide the Federal Government, on an ongoing basis, with information, reports, or records regarding the following:
(A) any ongoing or planned activities related to training, developing, or producing dual-use foundation models, including the physical and cybersecurity protections taken to assure the integrity of that training process against sophisticated threats;
(B) the ownership and possession of the model weights of any dual-use foundation models, and the physical and cybersecurity measures taken to protect those model weights; and
(C) the results of any developed dual-use foundation model’s performance in relevant AI red-team testing based on guidance developed by NIST pursuant to subsection 4.1(a)(ii) of this section, and a description of any associated measures the company has taken to meet safety objectives, such as mitigations to improve performance on these red-team tests and strengthen overall model security. Prior to the development of guidance on red-team testing standards by NIST pursuant to subsection 4.1(a)(ii) of this section, this description shall include the results of any red-team testing that the company has conducted relating to lowering the barrier to entry for the development, acquisition, and use of biological weapons by non-state actors; the discovery of software vulnerabilities and development of associated exploits; the use of software or tools to influence real or virtual events; the possibility for self-replication or propagation; and associated measures to meet safety objectives; and
(ii) Companies, individuals, or other organizations or entities that acquire, develop, or possess a potential large-scale computing cluster to report any such acquisition, development, or possession, including the existence and location of these clusters and the amount of total computing power available in each cluster.
(b) The Secretary of Commerce, in consultation with the Secretary of State, the Secretary of Defense, the Secretary of Energy, and the Director of National Intelligence, shall define, and thereafter update as needed on a regular basis, the set of technical conditions for models and computing clusters that would be subject to the reporting requirements of subsection 4.2(a) of this section. Until such technical conditions are defined, the Secretary shall require compliance with these reporting requirements for:
(i) any model that was trained using a quantity of computing power greater than 10²⁶ integer or floating-point operations, or using primarily biological sequence data and using a quantity of computing power greater than 10²³ integer or floating-point operations; and
(ii) any computing cluster that has a set of machines physically co-located in a single datacenter, transitively connected by data center networking of over 100 Gbit/s, and having a theoretical maximum computing capacity of 10²⁰ integer or floating-point operations per second for training AI.

Zach Stein-Perlman 17 Dec 2023 23:10 UTC
33 points
9
on: OpenAI, DeepMind, Anthropic, etc. should shut down.
But other labs are even less safe, and not far behind.
Yes, largely alignment is an unsolved problem on which progress is an exogenous function of time. But to a large extent we’re safer with safety-interested labs developing powerful AI: this will boost model-independent alignment research, make particular critical models more likely to be aligned/controlled, help generate legible evidence that alignment is hard (insofar as that exists), and maybe enable progress to pause at a critical moment.

Zach Stein-Perlman 28 Dec 2021 17:59 UTC
29 points
on: Omicron: My Current Model

Covid-19 will be one more disease among many, and life will be marginally worse, but by about April you shouldn’t act substantially differently than if it no longer existed.

This seems quite bold given our history of variants emerging. And if Omicron infects billions, then prima facie there’s great opportunity for mutation. I’d be interested to hear your credence in the following proposition:

From 1 May 2021 to 1 Jan 2030, Zvi won’t act substantially differently due to risk of SARS-CoV-2 infection.

Additionally, “one more disease among many” suggests (to me) that it won’t cause 100K+ more deaths in the following few years, which also seems bold. [edit: American deaths, see replies for more]

Zach Stein-Perlman 29 Jun 2023 2:36 UTC
25 points
8
on: When do “brains beat brawn” in Chess? An experiment
Some nitpicks:
- You write like Stockfish 14 is a probabilistic function from game-state to next-move, the thing-which-has-an-ELO. But I think Stockfish 14 running on X hardware for Y time is the real probabilistic function from game-state to next-move (see e.g. the inclusion of hardware in ELO ranking here). And you probably played with hardware and time such that its ELO is substantially below 3549.
- I think a human with Stockfish’s ELO would be much better at beating you down odds of a queen, since (not certain about these):
  - Stockfish is optimized for standard chess and human grandmasters are probably better at transferring to odds-chess.
  - Stockfish roughly tries to maximize P(win) against optimal play or Stockfish-level play, or maximize number of moves before losing once it knows you have a winning strategy. Human grandmasters would adapt to be better against your skill level (e.g. by trying to make positions more complex), and would sometimes correctly make choices that would be bad against Stockfish or optimal play but good against weaker players.

Zach Stein-Perlman 2 Nov 2023 18:35 UTC
24 points
15
on: Propaganda or Science: A Look at Open Source AI and Bioterrorism Risk
fwiw my guess is that OP didn’t ask its grantees to do open-source LLM biorisk work at all; I think its research grantees generally have lots of freedom.
(I’ve worked for an OP-funded research org for 1.5 years. I don’t think I’ve ever heard of OP asking us to work on anything specific, nor of us working on something because we thought OP would like it. Sometimes we receive restricted, project-specific grants, but I think those projects were initiated by us. Oh, one exception: Holden’s standards-case-studies project.)

Zach Stein-Perlman 14 Oct 2023 20:44 UTC
24 points
17
in reply to: Akash’s comment on: RSPs are pauses done right
What would a good RSP look like?
- Clear commitments along the lines of “we promise to run these 5 specific tests to evaluate these 10 specific dangerous capabilities.”
- Clear commitments regarding what happens if the evals go off (e.g., “if a model scores above a 20 on the Hubinger Deception Screener, we will stop scaling until it has scored below a 10 on the relatively conservative Smith Deception Test.”)
- Clear commitments regarding the safeguards that will be used once evals go off (e.g., “if a model scores above a 20 on the Cotra Situational Awareness Screener, we will use XYZ methods and we believe they will be successful for ABC reasons.”)
- Clear evidence that these evals will exist, will likely work, and will be conservative enough to prevent catastrophe
- Some way of handling race dynamics (such that Bad Guy can’t just be like “haha, cute that you guys are doing RSPs. We’re either not going to engage with your silly RSPs at all, or we’re gonna publish our own RSP but it’s gonna be super watered down and vague”).
Yeah, of course this would be nice. But the reason that ARC and Anthropic didn’t write this ‘good RSP’ isn’t that they’re reckless, but because writing such an RSP is a hard open problem. It would be great to have “specific tests” for various dangerous capabilities, or “Some way of handling race dynamics,” but nobody knows what those are.
Of course the specific object-level commitments Anthropic has made so far are insufficient. (Fortunately, they committed to make more specific object-level commitments before reaching ASL-3, and ASL-3 is reasonably well-specified [edit: and almost certainly below x-catastrophe-level].) I praise Anthropic’s RSP and disagree with your vibe because I don’t think you or I or anyone else could write much better commitments. (If you have specific commitments-labs-should-make in mind, please share them!)
(Insofar as you’re just worried about comms and what-people-think-about-RSPs rather than how-good-RSPs-are, I’m agnostic.)

Zach Stein-Perlman 6 Dec 2022 2:31 UTC
24 points
17
on: Updating my AI timelines

I built an (unpublished) TAI timelines model

I’d be excited to see this if it’s substantially different from existing published models. (Edit: yay, it’s https://www.lesswrong.com/posts/4ufbirCCLsFiscWuY/a-proposed-method-for-forecasting-ai)

I account for potential coordinated delays, catastrophes, and a 15% chance that we’re fundamentally wrong about all of this stuff.

+1 to noting this explicitly; everyone should distinguish between their conditional on no major disruptions and their unconditional models.

Zach Stein-Perlman 10 Apr 2024 19:06 UTC
23 points
14
on: RTFB: On the New Proposed CAIP AI Bill
In addition to the bill, CAIP has a short summary and a long summary.

Zach Stein-Perlman 12 Feb 2024 20:58 UTC
23 points
12
on: I played the AI box game as the Gatekeeper — and lost
Unfortunately, I can’t talk about the game itself, as that’s forbidden by the rules.
You two can just change the rules… I’m confused by this rule.

Zach Stein-Perlman 8 Apr 2022 17:11 UTC
22 points
on: It’s time for EA leadership to pull the fast-takeoff fire alarm.

Based on the past week’s worth of papers, it seems quite likely that we are now in a fast takeoff, and that we have 2-5 years until Moore’s law and organizational prioritization put these systems at AGI.

What makes you say this? What should I read to appreciate how big a deal for AGI the recent papers are?