axioman

Karma: 138

axioman 4 Nov 2021 23:10 UTC
33 points
on: EfficientZero: human ALE sample-efficiency w/MuZero+self-supervised
I guess I should update my paper on trends in sample efficiency soon / check whether recent developments are on trend (please message me if you are interested in doing this). This improvement does not seem to be extremely off-trend, but is definitely a bit more than I would have expected this year. Also, note that this results does NOT use the full suite of Atari games, but rather a subset of easier ones.

axioman 7 Apr 2020 22:19 UTC
27 points
on: Conflict vs. mistake in non-zero-sum games
Nitpick: I am pretty sure non-zero-sum does not imply a convex Pareto front.
Instead of the lens of negotiation position, one could argue that mistake theorists believe that the Pareto Boundary is convex (which implies that usually maximizing surplus is more important than deciding allocation), while conflict theorists see it as concave (which implies that allocation is the more important factor).
What links here?
- Conflict vs. mistake in non-zero-sum games by Nisan (5 Apr 2020 22:22 UTC; 167 points)

Proposal: Scaling laws for RL generalization

axioman1 Oct 2021 21:32 UTC

14 points

12 comments11 min readLW link

axioman 12 Jan 2020 11:26 UTC
10 points
in reply to: Stuart_Armstrong’s comment on: When Goodharting is optimal: linear vs diminishing returns, unlikely vs likely, and other factors
After looking at the update, my model is:
(Strictly) convex Pareto boundary: Extreme policies require strong beliefs. (Modulo some normalization of the rewards)
Concave (including linear) Pareto boundary: Extreme policies are favoured, even for moderate beliefs. (In this case, normalization only affects the “tipping point” in beliefs, where the opposite extreme policy is suddenly favoured).
In reality, we will often have concave and convex regions. The concave regions then cause more extreme policies for some beliefs, but the convex regions usually prevent the policy from completely focusing on a single objective.
From this lens, 1) maximum likelihood pushes us to one of the ends of the Pareto boundary, 2) an unlikely true reward pushes us close to the “bad” end, 3) Difficult optimization messes with normalisation (I am still somewhat confused about the exact role of normalization) and 4) Not accounting for diminishing returns bends the pareto boundary to become more concave.

axioman 22 Jan 2021 18:18 UTC
9 points
on: The Multi-Tower Study Strategy
“Beginners in college-level math would learn about functions, the basics of linear systems, and the difference between quantitative and qualitative data, all at the same time.”
This seems to be the standard approach for undergraduate-level mathematics at university, at least in Europe.

axioman 8 Mar 2020 13:19 UTC
8 points
in reply to: cousin_it’s comment on: Credibility of the CDC on SARS-CoV-2
Even if the claim was usually true on longer time scales, I doubt that pointing out an organisations mistakes and not entirely truthful statements usually increases the trust in them on the short time scales that might be most important here. Reforming organizations and rebuilding trust usually takes time.

axioman 4 Nov 2021 23:37 UTC
7 points
in reply to: gwern’s comment on: EfficientZero: human ALE sample-efficiency w/MuZero+self-supervised
A lot of the omissions you mention are due to inconsistent benchmarks (like the switch from the full Atari suite to Atari 100k with fewer and easier games) and me trying to keep results comparable.

This particular plot only has each year’s SOTA, as it would get too crowded with a higher temporal resolution (I used it for the comment, as it was the only one including smaller-sample results on Atari 100k and related benchmarks). I agree that it is not optimal for eyeballing trends.

I also agree that temporal trends can be problematic as people did not initially optimize for sample efficiency (I’m pretty sure I mention this in the paper); it might be useful to do a similar analysis for the recent Atari 100k results (but I felt that there was not enough temporal variation yet when I wrote the paper last year as sample efficiency seems to only have started receiving more interest starting in late 2019).

axioman 11 Jun 2020 17:48 UTC
7 points
on: Good and bad ways to think about downside risks
Nice post!
I would like to highlight that a naive application of the expected value perspective could lead to problems like the unilateralist’s curse and think that the post would be even more useful for readers who are new to these kinds of considerations if it discussed that more explicitly (or linked to relevant other posts prominently).

axioman 17 Sep 2021 21:29 UTC
4 points
in reply to: Owain_Evans’s comment on: How truthful is GPT-3? A benchmark for language models
I guess finetuning a model to produce truthful statements directly is nontrivial (especially without a discriminator model) because there are many possible truthful and many possible false responses to a question?

axioman 4 Jun 2021 21:02 UTC
4 points
on: An Intuitive Guide to Garrabrant Induction
“If the prices do not converge, then they must oscillate infinitely around some point. A trader could exploit the logical inductor by buying the sentence at a high point on the oscillation and selling at a low one.”
I know that this is an informal summary, but I don’t find this point intuitively convincing. Wouldn’t the trader also need to be able to predict the oscillation?

axioman 16 Feb 2020 22:43 UTC
4 points
on: On characterizing heavy-tailedness
Do you maybe have another example for action relevance? Nonfinite variance and finite support do not go well together.

axioman 12 Feb 2020 15:05 UTC
4 points
in reply to: Stuart_Armstrong’s comment on: Attainable utility has a subagent problem
“Not quite… ” are you saying that the example is wrong, or that it is not general enough? I used a more specific example, as I found it easier to understand that way.
I am not sure I understand: In my mind “commitments to balance out the original agent’s attainable utility” essentially refers to the second agent being penalized by the the first agent’s penalty (although I agree that my statement is stronger). Regarding your text, my statement refers to “SA will just precommit to undermine or help A, depending on the circumstances, just sufficiently to keep the expected rewards the same. ”.
My confusion is about why the second agent is only mildy constrained by this commitment. For example, weakening the first agent would come with a big penalty (or more precisely, building another agent that is going to weaken it gives a large penalty to the original agent), unless it’s reversible, right?
The bit about multiple subagents does not assume that more than one of them is actually built. It rather presents a scenario where building intelligent subagents is automatically penalized. (Edit: under the assumption that building a lot of subagents is infeasible or takes a lot of time).

axioman 12 Mar 2022 12:41 UTC
3 points
on: AI Performance on Human Tasks
Regarding Image classification performance it seems worth noting that ImageNet was labeled by human labelers (and IIRC there was a paper showing that labels are ambiguous or wrong for a substantial minority of the images).

As such, I don’t think we can conclude too much about superhuman AI performance on Image recognition from ImageNet alone (as perfect performance on the benchmark corresponds to perfectly replicating human judgement, admittedly aggregated over multiple humans). To demonstrate superhuman performance, a dataset with known ground truth were humans struggle to correctly label images would seem more appropriate.

axioman 4 Nov 2021 23:55 UTC
3 points
in reply to: Raemon’s comment on: What’s the difference between newer Atari-playing AI and the older Deepmind one (from 2014)?
The first thing you mention does not learn to play Atari, and is in general trained quite differently from Atari-playing AI’s (as it relies on self-play to kind of automatically generate a curriculum of harder and harder tasks, at least for the some of the more competitive tasks in XLand).

axioman 4 Nov 2021 23:45 UTC
3 points
in reply to: Teja Prabhu’s comment on: EfficientZero: human ALE sample-efficiency w/MuZero+self-supervised
Do you have a source for Agent57 using the same network weights for all games?

axioman 10 Oct 2021 16:28 UTC
3 points
in reply to: Chris_Leong’s comment on: Proposal: Scaling laws for RL generalization
Your point b) seems like it should also make you somewhat sceptical of any of this accelerating AI capabilities, unless you belief that capabilities-focused actors would change their actions based on forecasts, while safety-focused actors wouldn’t. Obviously, this is a matter of degree, and it could be the case that the same amount of action-changing by both actors still leads to worse outcomes.

I think that if OpenAI unveiled GPT4 and it did not perform noticeably better than GPT3 despite a lot more parameters, that would be a somewhat important update. And it seems like a similar kind of update could be produced by well-conducted research on scaling laws for complexity.

axioman 10 Oct 2021 10:35 UTC
3 points
in reply to: Chris_Leong’s comment on: Proposal: Scaling laws for RL generalization
Most recent large safety projects seem to be focused on language models. So in case the evidence pointed towards problem complexity not mattering that much, I would expect the shift in prioritization towards more RL-safety research to outweigh the effect on capability improvements (especially for the small version of the project, about which larger actors might not care that much). I am also sceptical whether the capabilities of the safety community are in fact increasing exponentially.

I am also confused about the resources/reputation framing. To me this is a lot more about making better predictions when we will get to transformative AI, and how this AI might work, such that we can use the available resources as efficiently as possible by prioritizing the right kind of work and hedging for different scenarios to an appropriate degree. This is particularly true for the scenario where complexity matters a lot (which I find overwhelmingly likely), in which too much focus on very short timelines might be somewhat costly (obviously none of these experiements can remotely rule out short timelines, but I do expect that they could attenuate how much people update on the XLand results).

Still, I do agree that it might make sense to publish any results on this somewhat cautiously.

axioman 2 Apr 2021 19:09 UTC
3 points
on: Systematizing Epistemics: Principles for Resolving Forecasts
“This desiderata is often difficult to reconcile with clear scoring, since complexity in forecasts generally requires complexity in scoring.”
Can you elaborate on this? In some sense, log-scoring is simple and can be applied to very complex distributions; Are you saying that the this would still be “complex scoring” because the complex forecast needs to be evaluated, or is your point about something different?

axioman 13 Mar 2021 11:24 UTC
3 points
in reply to: abramdemski’s comment on: Resolutions to the Challenge of Resolving Forecasts
Partial resolution could also help with getting some partial signal on long term forecasts.
In particular, if we know that a forecasting target is growing monotonously over time (like “date at which X happens” or “cumulative number of X before a specified date”), we can split P(outcome=T) into P(outcome>lower bound)*P(outcome=T|outcome>lower bound). If we use log scoring, we then get log(P(outcome>lower bound)) as an upper bound on the score.
If forecasts came in the form of more detailed models, it should be possible to use a similar approach to calculate bounds based on conditioning on more complicated events as well.

axioman 11 Dec 2020 20:36 UTC
3 points
in reply to: Pattern’s comment on: [AN #128]: Prioritizing research on AI existential safety based on its application to governance demands
“Overall, access to the AI strongly improved the subjects’ accuracy from below 50% to around 70%, which was further boosted to a value slightly below the AI’s accuracy of 75% when users also saw explanations. “

But this seems to be a function of the AI system’s actual performance, the human’s expectations of said performance, as well as the human’s baseline performance. So I’d expect it to vary a lot between tasks and with different systems.

axioman

Pro­posal: Scal­ing laws for RL generalization

Proposal: Scaling laws for RL generalization