JakubK

Karma: 402

JakubK 9 Aug 2023 20:02 UTC
7 points
0
on: Inflection.ai is a major AGI lab
Relevant tweet/quote from Mustafa Suleyman, the co-founder and CEO:
Powerful AI systems are inevitable. Strict licensing and regulation is also inevitable. The key thing from here is getting the safest and most widely beneficial versions of both.

JakubK 27 Jun 2023 5:43 UTC
2 points
0
in reply to: Michael Tontchev’s comment on: Best introductory overviews of AGI safety?
Thanks for writing and sharing this. I’ve added it to the doc.

JakubK 6 Jun 2023 22:12 UTC
1 point
0
on: Open Problems in AI X-Risk [PAIS #5]
What happened to black swan and tail risk robustness (section 2.1 in “Unsolved Problems in ML Safety”)?

JakubK 12 May 2023 6:32 UTC
4 points
0
in reply to: Muyyd’s comment on: All AGI Safety questions welcome (especially basic ones) [May 2023]
It’s hard to say. This CLR article lists some advantages that artificial systems have over humans. Also see this section of 80k’s interview with Richard Ngo:
Rob Wiblin: One other thing I’ve heard, that I’m not sure what the implication is: signals in the human brain — just because of limitations and the engineering of neurons and synapses and so on — tend to move pretty slowly through space, much less than the speed of electrons moving down a wire. So in a sense, our signal propagation is quite gradual and our reaction times are really slow compared to what computers can manage. Is that right?
Richard Ngo: That’s right. But I think this effect is probably a little overrated as a factor for overall intelligence differences between AIs and humans, just because it does take quite a long time to run a very large neural network. So if our neural networks just keep getting bigger at a significant pace, then it may be the case that for quite a while, most cutting-edge neural networks are actually going to take a pretty long time to go from the inputs to the outputs, just because you’re going to have to pass it through so many different neurons.
Rob Wiblin: Stages, so to speak.
Richard Ngo: Yeah, exactly. So I do expect that in the longer term there’s going to be a significant advantage for neural networks in terms of thinking time compared with the human brain. But it’s not actually clear how big that advantage is now or in the foreseeable future, just because it’s really hard to run a neural network with hundreds of billions of parameters on the types of chips that we have now or are going to have in the coming years.

JakubK 12 May 2023 6:21 UTC
3 points
0
in reply to: Quadratic Reciprocity’s comment on: All AGI Safety questions welcome (especially basic ones) [May 2023]
The cyborgism post might be relevant:
Executive summary: This post proposes a strategy for safely accelerating alignment research. The plan is to set up human-in-the-loop systems which empower human agency rather than outsource it, and to use those systems to differentially accelerate progress on alignment.
1. Introduction: An explanation of the context and motivation for this agenda.
2. Automated Research Assistants: A discussion of why the paradigm of training AI systems to behave as autonomous agents is both counterproductive and dangerous.
3. Becoming a Cyborg: A proposal for an alternative approach/frame, which focuses on a particular type of human-in-the-loop system I am calling a “cyborg”.
4. Failure Modes: An analysis of how this agenda could either fail to help or actively cause harm by accelerating AI research more broadly.
5. Testimony of a Cyborg: A personal account of how Janus uses GPT as a part of their workflow, and how it relates to the cyborgism approach to intelligence augmentation.

JakubK 12 May 2023 6:14 UTC
1 point
0
in reply to: Daniel Paleka’s comment on: How MATS addresses “mass movement building” concerns
Does current AI hype cause many people to work on AGI capabilities? Different areas of AI research differ significantly in their contributions to AGI.

Averting Catastrophe: Decision Theory for COVID-19, Climate Change, and Potential Disasters of All Kinds

JakubK2 May 2023 22:50 UTC

10 points

0 comments1 min readLW link

(nyupress.org)

JakubK 30 Apr 2023 9:37 UTC
7 points
0
on: List of lists of government AI policy ideas
- An AI Policy Tool for Today: Ambitiously Invest in NIST (Anthropic 2023)
- National Security Addition to the NIST AI RMF (Special Competitive Studies Project 2023)
- Existential risk and rapid technological change—a thematic study for UNDRR (Stauffer et al. 2023), especially section 4.3 (“30 actions to reduce existential risk”)
- Crafting Legislation to Prevent AI-Based Extinction: Submission of Evidence to the Science and Technology Select Committee’s Inquiry on the Governance of AI (Cohen and Osborne 2023)
- Why we need a new agency to regulate advanced artificial intelligence: Lessons on AI control from the Facebook Files (Korinek 2021)

JakubK 28 Apr 2023 4:13 UTC
4 points
3
on: A decade of lurking, a month of posting
I’ve grown increasingly alarmed and disappointed by the number of highly-upvoted and well-received posts on AI, alignment, and the nature of intelligent systems, which seem fundamentally confused about certain things.
Can you elaborate on how all these linked pieces are “fundamentally confused”? I’d like to see a detailed list of your objections. It’s probably best to make a separate post for each one.

JakubK 26 Apr 2023 22:39 UTC
1 point
0
in reply to: Bezzi’s comment on: GPT-4 solves Gary Marcus-induced flubs
That was arguably the hardest task, because it involved multi-step reasoning. Notably, I didn’t even notice that GPT-4′s response was wrong.

JakubK 26 Apr 2023 22:36 UTC
1 point
0
in reply to: JoeTheUser’s comment on: GPT-4 solves Gary Marcus-induced flubs
I believe that Marcus’ point is that there are classes of problems that tend to be hard for LLMs (biological reasoning, physical reasoning, social reasoning, practical reasoning, object and individual tracking, nonsequiturs). The argument is that problems in these class will continue to hard.
Yeah this is the part that seems increasingly implausible to me. If there is a “class of problems that tend to be hard … [and] will continue to be hard,” then someone should be able to build a benchmark that models consistently struggle with over time.

JakubK 26 Apr 2023 22:26 UTC
3 points
0
in reply to: habryka’s comment on: Should we publish mechanistic interpretability research?
Oh I see; I read too quickly. I interpreted your statement as “Anthropic clearly couldn’t care less about shortening timelines,” and I wanted to show that the interpretability team seems to care.
Especially since this post is about capabilities externalities from interpretability research, and your statement introduces Anthropic as “Anthropic, which is currently the biggest publisher of interp-research.” Some readers might conclude corollaries like “Anthropic’s interpretability team doesn’t care about advancing capabilities.”

JakubK 23 Apr 2023 19:30 UTC
3 points
0
on: List of lists of government AI policy ideas
Ezra Klein listed some ideas (I’ve added some bold):
The first is the question — and it is a question — of interpretability. As I said above, it’s not clear that interpretability is achievable. But without it, we will be turning more and more of our society over to algorithms we do not understand. If you told me you were building a next generation nuclear power plant, but there was no way to get accurate readings on whether the reactor core was going to blow up, I’d say you shouldn’t build it. Is A.I. like that power plant? I’m not sure. But that’s a question society should consider, not a question that should be decided by a few hundred technologists. At the very least, I think it’s worth insisting that A.I. companies spend a good bit more time and money discovering whether this problem is solvable.
The second is security. For all the talk of an A.I. race with China, the easiest way for China — or any country for that matter, or even any hacker collective — to catch up on A.I. is to simply steal the work being done here. Any firm building A.I. systems above a certain scale should be operating with hardened cybersecurity. It’s ridiculous to block the export of advanced semiconductors to China but to simply hope that every 26-year-old engineer at OpenAI is following appropriate security measures.
The third is evaluations and audits. This is how models will be evaluated for everything from bias to the ability to scam people to the tendency to replicate themselves across the internet.
Right now, the testing done to make sure large models are safe is voluntary, opaque and inconsistent. No best practices have been accepted across the industry, and not nearly enough work has been done to build testing regimes in which the public can have confidence. That needs to change — and fast. Airplanes rarely crash because the Federal Aviation Administration is excellent at its job. The Food and Drug Administration is arguably too rigorous in its assessments of new drugs and devices, but it is very good at keeping unsafe products off the market. The government needs to do more here than just write up some standards. It needs to make investments and build institutions to conduct the monitoring.
The fourth is liability. There’s going to be a temptation to treat A.I. systems the way we treat social media platforms and exempt the companies that build them from the harms caused by those who use them. I believe that would be a mistake. The way to make A.I. systems safe is to give the companies that design the models a good reason to make them safe. Making them bear at least some liability for what their models do would encourage a lot more caution.
The fifth is, for lack of a better term, humanness. Do we want a world filled with A. I. systems that are designed to seem human in their interactions with human beings? Because make no mistake: That is a design decision, not an emergent property of machine-learning code. A.I. systems can be tuned to return dull and caveat-filled answers, or they can be built to show off sparkling personalities and become enmeshed in the emotional lives of human beings.

JakubK 23 Apr 2023 3:58 UTC
2 points
−4
in reply to: habryka’s comment on: Should we publish mechanistic interpretability research?
Anthropic, which is currently the biggest publisher of interp-research, clearly does not have a commitment to not work towards advancing capabilities
This statement seems false based on this comment from Chris Olah.

JakubK 23 Apr 2023 3:51 UTC
2 points
0
on: Should we publish mechanistic interpretability research?
Thus, we decided to ask multiple people in the alignment scene about their stance on this question.
Richard
Any reason you’re not including people’s last names? To a newcomer this would be confusing. “Who is Richard?”

JakubK 22 Apr 2023 22:59 UTC
2 points
0
on: Counterarguments to Core AI X-Risk Stories?
Here’s a list of arguments for AI safety being less important, although some of them are not object-level.

JakubK 22 Apr 2023 22:54 UTC
2 points
3
in reply to: DavidW’s comment on: Deceptive Alignment is <1% Likely by Default
To argue for that level of confidence, I think the post needs to explain why AI labs will actually utilize the necessary techniques for preventing deceptive alignment.

JakubK 22 Apr 2023 22:53 UTC
3 points
0
on: Order Matters for Deceptive Alignment
The model knows it’s being trained to do something out of line with its goals during training and plays along temporarily so it can defect later. That implies that differential adversarial examples exist in training.
I don’t think this implication is deductively valid; I don’t think the premise entails the conclusion. Can you elaborate?
I think this post’s argument relies on that conclusion, along with an additional assumption that seems questionable: that it’s fairly easy to build an adversarial training setup that distinguishes the design objective from all other undesirable objectives that the model might develop during training; in other words, that the relevant differential adversarial examples are fairly easy for humans to engineer.

Notes on “the hot mess theory of AI misalignment”

JakubK21 Apr 2023 10:07 UTC

16 points

0 comments5 min readLW link

(sohl-dickstein.github.io)

JakubK 19 Apr 2023 22:27 UTC
5 points
0
on: Some Intuitions Around Short AI Timelines Based on Recent Progress
Some comments:
A large amount of the public thinks AGI is near.
This links to a poll of Lex Fridman’s Twitter followers, which doesn’t seem like a representative sample of the US population.
they jointly support a greater than 10% likelihood that we will develop broadly human-level AI systems within the next decade.
Is this what you’re arguing for when you say “short AI timelines”? I think that’s a fairly common view among people who think about AI timelines.
AI is starting to be used to accelerate AI research.
My sense is that Copilot is by far the most important example here.
I imagine visiting alien civilizations much like earth, and I try to reason from just one piece of evidence at a time about how long that planet has.
I find this part really confusing. Is “much like earth” supposed to mean “basically the same as earth”? In that case, why not just present each piece of evidence normally, without setting up an “alien civilization” hypothetical? For example, the “sparks of AGI” paper provides very little evidence for short timelines on its own, because all we know is the capabilities of a particular system, not how long it took to get that point and whether that progress might continue.
The first two graphs show the overall number of college degrees and the number of STEM degrees conferred from 2011 to 2021
Per year, or cumulative? Seems like it’s per year.
If you think one should put less than 20% of their timeline thinking weight on recent progress
Can you clarify what you mean by this?
Overall, I think this post provides evidence that short AI timelines are possible, but doesn’t provide strong evidence that short AI timelines are probable. Here are some posts that provide more arguments for the latter point:

JakubK

Avert­ing Catas­tro­phe: De­ci­sion The­ory for COVID-19, Cli­mate Change, and Po­ten­tial Disasters of All Kinds

Richard

Notes on “the hot mess the­ory of AI mis­al­ign­ment”

Averting Catastrophe: Decision Theory for COVID-19, Climate Change, and Potential Disasters of All Kinds

Notes on “the hot mess theory of AI misalignment”