ryan_greenblatt

Karma: 6,180

I work at Redwood Research.

ryan_greenblatt 15 Jun 2021 12:09 UTC
6 points
on: (Another) Using a Memory Palace to Memorize a Textbook
First of all, interesting post. This gave me a better understanding of the process of creating a memory palace and updated me toward thinking memory palaces are much harder than I expected.

This post has made me think that memory palaces are not useful for me; typically, I want to memorize things either for recall faster than internet lookup or to make it easier to build intuition and connections.

This makes me wonder why you went through this and what other benefits exist. Why not just use the internet as slow memory given that memory palaces require slow reconstruction anyway?

ryan_greenblatt 18 Jun 2021 12:17 UTC
1 point
in reply to: Spencer_Green’s comment on: (Another) Using a Memory Palace to Memorize a Textbook
Thanks for the response.

like if you want to remember simple things, like a grocery list, you can plop groceries around a path in your house

I will certainly try this.

[Question] Questions about multivitamins, especially manganese

ryan_greenblatt19 Jun 2021 16:09 UTC

7 points

8 comments1 min readLW link

ryan_greenblatt 4 Jul 2021 21:58 UTC
12 points
on: Can group identity be a force for good?
First of all, I like this post and (at least roughly) agree with the core premise. I also think similar arguments can apply for other cognitive biases/cognitive heuristics. For example, see Sunk Costs Fallacy Fallacy.

Tribalism is a soldier of Moloch, the god of defecting in prisoner’s dilemmas.

I’m modestly confident that the opposite is true for our hunter gatherer ancestors and for small groups more generally. For example, we can model individuals freeloading and failing to gather food for the group as an iterated, many way prisoners dilemma. In this case I would imagine that tribalism tends toward cooperate over defect. Similarly, consider group conflict. The defect/Moloch option here is actually avoiding the fight which reduces risk of injury without substantially reducing the probability of your group winning. Tribalism would tend toward more (violent) opposition of the other group.

I have no idea how tribalism interacts with Moloch for the large ideological tribes of today.

Large corporations can unilaterally ban/tax ransomware payments via bets

ryan_greenblatt17 Jul 2021 12:56 UTC

26 points

5 comments2 min readLW link

ryan_greenblatt 19 Jul 2021 11:31 UTC
1 point
in reply to: Alex Vermillion’s comment on: Large corporations can unilaterally ban/tax ransomware payments via bets
Yes. The difference is that betting on something is zero expected value (instead of just agreeing to pay which is negative expected value).

Legal contracts should avoid most issues with lying/cheating. The difficulty of cheating should be similar to insider trading. Companies make bets and pay those bets all the time: options and futures contracts.

ryan_greenblatt 9 Aug 2021 11:39 UTC
1 point
in reply to: Alex Vermillion’s comment on: Large corporations can unilaterally ban/tax ransomware payments via bets
I don’t understand what you mean. Specifically, I don’t understand what you are using ‘0’ for.

If the chance of paying is $p$ , then the betting odds will reflect this with the assumption that the market is reasonably efficient. For a simple fixed rate bet, for each dollar the company stakes, they win an additional $p / (1 - p)$ if they don’t payout over the time period (again assuming betting odds reflect the underlying probability).

Expected value (for the 1 dollar bet) is then: $(1 - p) * (p / (1 - p)) - p * 1 = 0$

Of course, there is possibility for adverse selection/asymmetric information which could make the market somewhat less efficient.

ryan_greenblatt 4 Oct 2021 16:42 UTC
3 points
on: Candy Innovation
Where are the novel fruits? There are some new apple varieties and I think seedless watermelons have improved somewhat, but we are still missing totally new fruits produced via genetic engineering. Where are the GMO raspberry grapes or the coreless apples? What about mango flavored plums?

Realistically, totally new fruits or large changes in flavor are probably very hard to engineer, but I still have some hope.

Naive self-supervised approaches to truthful AI

ryan_greenblatt23 Oct 2021 13:03 UTC

9 points

4 comments2 min readLW link

ryan_greenblatt 29 Oct 2021 9:04 UTC
1 point
in reply to: Charlie Steiner’s comment on: Naive self-supervised approaches to truthful AI
I would imagine that if you have a limited question pool used for self-supervision, then applying this constraint while training from scratch would result in overfitting with less generalization (but I’m not super confident in this, and there might be descent ways to avoid this).

If the question pool is very large/generated or the constraint is generally enforced on text generation (I’m not sure this makes much sense), then this might do something interesting.

I don’t have the resources to run an experiment like this at the moment (particularly not with a very large model like GPT-J).

Framing approaches to alignment and the hard problem of AI cognition

ryan_greenblatt15 Dec 2021 19:06 UTC

16 points

15 comments27 min readLW link

ryan_greenblatt 15 Dec 2021 19:07 UTC
1 point
on: Framing approaches to alignment and the hard problem of AI cognition
Is GPT-3 (almost entirely) a purely deontological AI?
Will GPT-n always be a purely deontological AI? (Where GPT-n is some large transformer based language model trained on next token predication.)
What links here?
- Framing approaches to alignment and the hard problem of AI cognition by ryan_greenblatt (15 Dec 2021 19:06 UTC; 16 points)

ryan_greenblatt 15 Dec 2021 19:08 UTC
1 point
on: Framing approaches to alignment and the hard problem of AI cognition
Does there exist a pivotal act which is achievable only using process based task AI?
What links here?
- Framing approaches to alignment and the hard problem of AI cognition by ryan_greenblatt (15 Dec 2021 19:06 UTC; 16 points)

ryan_greenblatt 16 Dec 2021 0:31 UTC
1 point
in reply to: jacob_cannell’s comment on: Framing approaches to alignment and the hard problem of AI cognition
Just curious—how much time have you invested in the DL literature vs LW/sequences/safety?

Prior to several month ago I had mostly read DL/ML literature. But recently I’ve been reading virtually only alignment literature.

One thing that consistently infuriates me is the extent to which the AI-safety community has invented it’s own terminology/onotology that is largely at odds with DL/ML.

I actually think there are very good reasons the AI-safety community uses different terms (not that we know the right terms/abstractions at the moment). I won’t get into a full argument for this, but a few reasons:
- Alignment is generally trying to work with high intelligence regimes where concepts like ‘intent’ are better specified.
- Often, things are presented more generally than just standard ML
The utility functions of human children aren’t ‘perfectly inner aligned’ with that of their parents, but human-level alignment would probably be good enough. Don’t let perfect be the enemy of the good.

Children aren’t superintelligent AGIs for which instrumental convergence applies.

‘consequentialist agent’ mostly maps to model-based RL agent

For current capability regimes, sure. In the future? Not so clear. Consequentialist is a more general idea.

ryan_greenblatt 16 Dec 2021 7:39 UTC
1 point
in reply to: Steven Byrnes’s comment on: Framing approaches to alignment and the hard problem of AI cognition

I understand this exchange as Ryan saying “the goals of AGI must be a perfect match to what we want”, and Jacob as replying “you can’t literally mean perfect, as in not even off by one part per googol, e.g. we bequeath the universe to the next generation despite knowing that they won’t share our values”, and then Ryan is doubling down “Yes I mean perfect”.

Oh, no, this wasn’t what I meant. I just meant that the usage of children as an example was poor because individual children don’t have the potential to succesfully seek vast power. There certainly is a level of sufficient alignment of a just consequentialist utility function which looks like $1 - ϵ$ as opposed to $1$ . I think this $ϵ$ is pretty low, but I reiterate for ‘purely long-run consequentialists’. Note that $ϵ$ must exceptionally low for this sort of AI not to seek power (assuming that avoiding power seeking is desired for the utility function, perhaps we are fine with power seeking as we have the desired consequentialist values, whatever those may be, locked in).

If so, I’m with Jacob. For one thing, if we perfectly nail the AGI’s motivation in regards to transparency, honesty, corrigibility, helpfulness, keeping humans in the loop, etc., but we mess up other aspects of the AGI’s motivation, then the AGI should help us identify and fix the problem

Agreed, but these aren’t consequentialist properties. At least that isn’t how I model them.

I shouldn’t have given such a vague response to the child metaphor.

Researcher incentives cause smoother progress on benchmarks

ryan_greenblatt21 Dec 2021 4:13 UTC

20 points

4 comments1 min readLW link

ryan_greenblatt 22 Dec 2021 2:21 UTC
1 point
in reply to: Matthew Barnett’s comment on: Researcher incentives cause smoother progress on benchmarks

… then the main reason to expect a discontinuity would be if there is some other weird discontinuity elsewhere

This discontinuity could lie in the space of AI discoveries. The discovery space is not guaranteed to be efficiently explored: there could be simple and high impact discoveries which occur later on. I’m not sure how much credence I put in this idea. Empirically it does seem like the discovery space is explored efficiently in most fields with high investment, but generalizing this to AI seems non-trivial. Possible exceptions include relativity in physics.

Edit: I’m using the term efficiency somewhat loosely here. There could be discoveries which are very difficult to think of but which are considerably more simple than current approaches. I’m refering to the failure to find these discoveries as ‘inefficiency’, but there isn’t concrete action which can/should be taken to resolve this.

Rob Bensinger examines this idea in more detail in this discussion.

Potential gears level explanations of smooth progress

ryan_greenblatt22 Dec 2021 18:05 UTC

4 points

2 comments2 min readLW link

ryan_greenblatt 15 Apr 2022 16:48 UTC
1 point
in reply to: evhub’s comment on: A Longlist of Theories of Impact for Interpretability

should be very related

Perhaps you meant shouldn’t?

ryan_greenblatt 15 Apr 2022 17:17 UTC
3 points
on: Takeoff speeds have a huge effect on what it means to work on AI x-risk
In contrast, in a slow takeoff world, many aspects of the AI alignment problems will already have showed up as alignment problems in non-AGI, non-x-risk-causing systems; in that world, there will be lots of industrial work on various aspects of the alignment problem, and so EAs now should think of themselves as trying to look ahead and figure out which margins of the alignment problem aren’t going to be taken care of by default, and try to figure out how to help out there.

TLDR: I think an important sub-question is ‘how fast is agency takeoff’ as opposed to economic/power takeoff in general.

There are a few possible versions of this in slow takeoff which look quite different IMO.
1. Agentic systems show up before the end of the world and industry works to align these systems. Here’s a silly version of this:
GPT-n prefers writing romance to anything else. It’s not powerful enough to take over the world but it does understand it’s situation, what training is etc. And it would take over the world if it could and this is somewhat obvious to industry. In practice it mostly tries to detect when it isn’t in training and then steer outputs in a more romantic direction. Industry would like to solve this, but finetuning isn’t enough and each time they’ve (naively) retrained models they just get some other ‘quirky’ behavior (but at least soft-core romance is better than that AI which always asks for crypto to be sent to various addresses). And adversarial training just results in getting other strange behavior.

Industry works on this problem because it’s embarassing and it costs them money to discard 20% of completions as overly romantic. They also foresee the problem getting worse (even if they don’t buy x-risk).
1. Not obviously agentic systems have alignment problems, but we don’t see obvious, near human level agency until the end of the world. This is slow takeoff world, so these systems are taking over a larger and larger fraction of the economy despite to being very agentic. These alignment issues could be reward hacking or just general difficulty getting language models to follow instructions to the best of their ability (as shows up currently).
I’d claim that in a world which is more centrally scenerio (2), industrial work on the ‘alignment problem’ might not be very useful for reducing existential risk in the same way that I think that a lot of current ‘applied alignment’/instruction following/etc isn’t very useful. So, this world goes similarly to fast takeoff in terms of research prioritization. But in something like scenerio (1), industry has to do more useful research and problems are more obvious.

ryan_greenblatt

[Question] Ques­tions about mul­ti­vi­tam­ins, es­pe­cially manganese

Large cor­po­ra­tions can unilat­er­ally ban/​tax ran­somware pay­ments via bets

Naive self-su­per­vised ap­proaches to truth­ful AI

Fram­ing ap­proaches to al­ign­ment and the hard prob­lem of AI cognition

Re­searcher in­cen­tives cause smoother progress on bench­marks

Po­ten­tial gears level ex­pla­na­tions of smooth progress