jacquesthibs 6 Jun 2023 20:14 UTC
46 points
5
on: Algorithmic Improvement Is Probably Faster Than Scaling Now
I spoke to Altman about a month ago. He essentially said some of the following:
- His recent statement about scaling essentially plateau-ing was misunderstood and he still thinks it plays a big role.
- Then, I asked him what comes next and he said they are working on the next thing that will provide 1000x improvement (some new paradigm).
- I asked if online learning plays a role in that and he said yes.
- That’s one of the reasons we started to work on Supervising AIs Improving AIs.
In a shortform last month, I wrote the following:
There has been some insider discussion (and Sam Altman has said) that scaling has started running into some difficulties. Specifically, GPT-4 has gained a wider breath of knowledge, but has not significantly improved in any one domain. This might mean that future AI systems may gain their capabilities from places other than scaling because of the diminishing returns from scaling. This could mean that to become “superintelligent”, the AI needs to run experiments and learn from the outcome of those experiments to gain more superintelligent capabilities. You can only learn so much from a static dataset.
So you can imagine the case where capabilities come from some form of active/continual/online learning, but that was only possible once models were scaled up enough to gain capabilities in that way. And so that as LLMs become more capable, they will essentially become capable of running their own experiments to gain alphafold-like capabilities across many domains.
Of course, this has implications for understanding takeoffs / sharp left turns.
As Max H said, I think once you meet a threshold with a universal interface like a language model, things start to open up and the game changes.

jacquesthibs 11 Apr 2023 19:33 UTC
LW: 39 AF: 16
7
AF
on: Evolution provides no evidence for the sharp left turn
Here’s my takeaway:

There are mechanistic reasons for humanity’s “Sharp Left Turn” with respect to evolution. Humans were bottlenecked by knowledge transfer between new generations, and the cultural revolution allowed us to share our lifetime learnings with the next generation instead of waiting on the slow process of natural selection.

Current AI development is not bottlenecked in the same way and, therefore, is highly unlikely to get a sharp left turn for the same reason. Ultimately, evolution analogies can lead to bad unconscious assumptions with no rigorous mechanistic understanding. Instead of using evolution to argue for a Sharp Left Turn, we should instead look for arguments that are mechanistically specific to current AI development because we are much less likely to make confused mistakes that unconsciously rely on human evolution assumptions.

AI may still suffer from a fast takeoff (through AI driving capabilities research or iteratively refining it’s training data), but for AI-specific reasons so we should be paying attention to that kind of fast takeoff might happen and how to deal with it.

Edited after Quintin’s response.

AI Alignment YouTube Playlists

jacquesthibs and remember

9 May 2022 21:33 UTC

30 points

4 comments1 min readLW link

Foresight for AGI Safety Strategy: Mitigating Risks and Identifying Golden Opportunities

jacquesthibs5 Dec 2022 16:09 UTC

28 points

6 comments8 min readLW link

jacquesthibs 24 Sep 2023 3:41 UTC
26 points
0
in reply to: jacquesthibs’s comment on: jacquesthibs’s Shortform
There’s someone on X (f.k.a.Twitter) called Jimmy Apples (🍎/acc) and he has shared some information in the past that turned out to be true (apparently the GPT-4 release date and that OAI’s new model would be named “Gobi”). He recently tweeted, “AGI has been achieved internally.” Some people think that the Reddit comment below may be from the same guy (this is just a weak signal, I’m not implying you should consider it true or update on it):

jacquesthibs 9 Sep 2023 19:41 UTC
26 points
14
on: AI presidents discuss AI alignment agendas
This was hilarious, thanks for making it!

[Question] How is ARC planning to use ELK?

jacquesthibs15 Dec 2022 20:11 UTC

24 points

5 comments1 min readLW link

jacquesthibs 24 Jan 2024 11:48 UTC
23 points
9
on: jacquesthibs’s Shortform
I thought this series of comments from a former DeepMind employee (who worked on Gemini) were insightful so I figured I should share.
From my experience doing early RLHF work for Gemini, larger models exploit the reward model more. You need to constantly keep collecting more preferences and retraining reward models to make it not exploitable. Otherwise you get nonsensical responses which have exploited the idiosyncracy of your preferences data. There is a reason few labs have done RLHF successfully.
It’s also know that more capable models exploit loopholes in reward functions better. Imo, it’s a pretty intuitive idea that more capable RL agents will find larger rewards. But there’s evidence from papers like this as well: https://arxiv.org/abs/2201.03544
To be clear, I don’t think the current paradigm as-is is dangerous. I’m stating the obvious because this platform has gone a bit bonkers.
The danger comes from finetuning LLMs to become AutoGPTs which have memory, actions, and maximize rewards, and are deployed autonomously. Widepsread proliferation of GPT-4+ models will almost certainly make lots of these agents which will cause a lot of damage and potentially cause something indistinguishable from extinction.
These agents will be very hard to align. Trading off their reward objective with your “be nice” objective won’t work. They will simply find the loopholes of your “be nice” objective and get that nice fat hard reward instead.
We’re currently in the extreme left-side of AutoGPT exponential scaling (it basically doesn’t work now), so it’s hard to study whether more capable models are harder or easier to align.
Other comments from that thread:
My guess is where your intuitive alignment strategy (“be nice”) breaks down for AI is that unlike humans, AI is highly mutable. It’s very hard to change a human’s sociopathy factor. But for AI, even if *you* did find a nice set of hyperparameters that trades off friendliness and goal-seeking behavior well, it’s very easy to take that, and tune up the knobs to make something dangerous. Misusing the tech is as easy or easier than not. This is why many put this in the same bucket as nuclear.
US visits Afghanistan, teaches them how to make power using Nuclear tech, next month, they have nukes pointing at Iran.
And:
In contexts where harms will be visible easily and in short timelines, we’ll take them offline and retrain.
Many applications will be much more autonomous, difficult to monitor or even understand, and potentially fully close loop, i.e the agent has a complex enough action space that it can copy itself, buy compute, run itself, etc.
I know it sounds scifi. But we’re living in scifi times. These things have a knack of becoming true sooner than we think.
No ghosts in the matrices assumed here. Just intelligence starting from a very good base model optimizing reward.

There are more comments he made in that thread that I found insightful, so go have a look if interested.

jacquesthibs 5 Mar 2023 2:55 UTC
23 points
6
on: Why Not Just… Build Weak AI Tools For AI Alignment Research?
Thanks for writing this post, John! I’ll comment since this is one of the directions I am exploring (released an alignment text dataset, published a survey for feedback on tools for alignment research, and have been ruminating on these ideas for a while).
Thus, my current main advice for people hoping to build AI tools for boosting alignment research: go work on the object-level research you’re trying to boost for a while. Once you have a decent amount of domain expertise, once you have made any progress at all (and therefore have any first-hand idea of what kinds of things even produce progress), then you can maybe shift to the meta-level^[2].
I mostly agree with this, which is why I personally took 6 months away from this approach and tried to develop my domain expertise during my time at MATS. I don’t think this is enough time, unfortunately (so I might spend more time on object-level work after my current 3-month grant). However, I plan to continue to do object-level research and build tools that are informed by my own bottlenecks and others. There are already many things I think I could build that would accelerate my work and possibly accelerate the work of others.
I see my current approach as creating a feedback loop where both things that take up my time inform each other (so I at least have N>0 users). I expect to build the things that seem the most useful for now, then re-evaluate based on feedback (is this accelerating alignment greatly or not at all?) and then decide whether I should focus all my time on object-level research again. Though I expect at this point that I could direct some software engineers to build out the things I have in mind at the same time.
What I’ve found that might be valuable for thinking about these tools is to backcast how I might see myself or others coming up with a solution to alignment and then focusing on tools that would primarily accelerate research into actually being crucially important for solving the problem rather than optimizing for something else. I think dedicating time to object-level work has been helpful for this.
At a meta level, cognitive tool-building is very much the sort of work where you should pick one or a handful of people to build the prototype for, focus on making those specific people much more productive, and get a fast feedback loop going that way. That’s how wrong initial guesses turn into better later guesses.
Agreed.
If the tracked-information is represented somewhere outside my head, then (a) it frees up a lot of working memory and lets me track more things, and (b) it makes it much easier to communicate what I’m thinking to others.
Yes! That is precisely what I have in mind when thinking about building tools. What can I build that sufficiently frees up working memory / cognitive load so that the researcher can use that extra space for thinking more deeply about other things?
A side problem which I do not think is the main problem for “AI tools for AI alignment” approaches: there is a limit to how much of a research productivity multiplier we can get from google-search-style tools. Google search is helpful, but it’s not a 100x on research productivity (as evidenced by the lack of a 100x jump in research productivity shortly after Google came along). Fundamentally, a key part of what makes such tools “tools” is that most of the key load-bearing cognition still “routes through” a human user; thus the limit on how much of a productivity boost they could yield. But I do find a 2x boost plausible, or maybe 5-10x on the optimistic end. The more-optimistic possibilities in that space would be a pretty big deal.
I aim for a minimum 10x speed up when thinking about this general approach (or at least leads to some individual, specific breakthroughs in alignment). I’m still grappling with when to drop this direction if it is not very fruitful. I’m trying to be conscious of what I think weak AI won’t be able to solve. Either way, I hope to bring on software engineers / builders who can help make progress on some of my ideas (some have already).
What links here?
- jacquesthibs's comment on Project “MIRI as a Service” by RomanS (9 Mar 2023 7:30 UTC; 15 points)

[Question] Can independent researchers get a sponsored visa for the US or UK?

jacquesthibs24 Mar 2023 6:10 UTC

20 points

1 comment1 min readLW link

jacquesthibs 30 Sep 2023 13:49 UTC
20 points
13
in reply to: Tristan Williams’s comment on: EA Vegan Advocacy is not truthseeking, and it’s everyone’s problem
I’m not sure why you’d think it’s less sustainable than veganism. In my mind, it’s effective because it is sustainable and reduces most of the suffering. Just like how EA tries to be effective (and sustainable) by not telling people to donate massive amounts of their income (just a small-ish percentage that works for them to the most effective charities), I see my approach as the same. It’s the sweet-spot between reducing suffering and sustainability (for me).

Is the “Valley of Confused Abstractions” real?

jacquesthibs5 Dec 2022 13:36 UTC

19 points

11 comments2 min readLW link

AISC Project: Benchmarks for Stable Reflectivity

jacquesthibs13 Nov 2023 14:51 UTC

17 points

0 comments8 min readLW link

jacquesthibs 22 Nov 2023 23:32 UTC
17 points
1
on: OpenAI: The Battle of the Board
Someone else reported that Sam seemingly was trying to get Helen off of the board weeks prior to the firing:

jacquesthibs 14 Nov 2023 19:14 UTC
17 points
3
on: jacquesthibs’s Shortform
If you work at a social media website or YouTube (or know anyone who does), please read the text below:
Community Notes is one of the best features to come out on social media apps in a long time. The code is even open source. Why haven’t other social media websites picked it up yet? If they care about truth, this would be a considerable step forward beyond. Notes like “this video is funded by x nation” or “this video talks about health info; go here to learn more” messages are simply not good enough.
If you work at companies like YouTube or know someone who does, let’s figure out who we need to talk to to make it happen. Naïvely, you could spend a weekend DMing a bunch of employees (PMs, engineers) at various social media websites in order to persuade them that this is worth their time and probably the biggest impact they could have in their entire career.
If you have any connections, let me know. We can also set up a doc of messages to send in order to come up with a persuasive DM.

jacquesthibs 7 Dec 2022 12:04 UTC
16 points
15
on: MIRI’s “Death with Dignity”, but in 80 seconds.
“Oh and btw, and while you are trying to increase the log-odds that humanity survives this century, don’t do anything stupid and rash that is way out-of distribution of normal actions. You are not some God who can do the full utilitarian calculus. If an action you are thinking about is far out-of-distribution and looks probably bad to a lot of people, it’s likely because it is. In other words, don’t naively take rash actions thinking it’s for the good of humanity. Default to ³⁄₄ utilitarian.”

Connor Leahy’s opinion on the post (55:33):