Sam Bowman

Karma: 942

https://cims.nyu.edu/~sbowman/

Sam Bowman 13 Oct 2021 23:20 UTC
LW: 1 AF: 1
AF
in reply to: DanielFilan’s comment on: Imitative Generalisation (AKA ‘Learning the Prior’)
Another very minor (but briefly confusing) nit: The notation in the `Example’ section is inconsistent between probabilities and log probabilities. It introduces $H^{p r i o r} (z)$ (etc.) as a probability, but then treats it as a log probability in the line starting with ‘We find the $z^{*}$ ’.

NLP Position Paper: When Combatting Hype, Proceed with Caution

Sam Bowman15 Oct 2021 20:57 UTC

46 points

14 comments1 min readLW link

Sam Bowman 16 Oct 2021 19:12 UTC
LW: 3 AF: 2
AF
in reply to: adamShimi’s comment on: NLP Position Paper: When Combatting Hype, Proceed with Caution
Thanks—fixed! (The sentence-final period got folded into the URL.)

Sam Bowman 16 Oct 2021 19:34 UTC
LW: 1 AF: 1
AF
in reply to: Pattern’s comment on: NLP Position Paper: When Combatting Hype, Proceed with Caution
Thanks! (Typo fixed.)
[Old technique] had [problem]...
For this point, I’m not sure how it fits into the argument. Could you say more?
Is there any empirical base...
Yeah, this is a missed opportunity that I haven’t had the time/expertise to take on. There probably are comparable situations in the histories of other applied research fields, but I’m not aware of any good analogies. I suspect that a deep dive into some history-and-sociology-of-science literature would be valuable here.
What if the impact grows dramatically as...they get deployed widely? …
I think this kind of discussion is already well underway within NLP and adjacent subfields like FaCCT. I don’t have as much to add there.
(Weird meta-note: Are you aware of something unusual about how this comment is posted? I saw a notification for it, but I didn’t see it in the comments section for the post itself until initially submitting this reply. I’m newish to posting on Lightcone forums...)

Sam Bowman 16 Oct 2021 19:37 UTC
LW: 1 AF: 1
AF
in reply to: SoerenMind’s comment on: NLP Position Paper: When Combatting Hype, Proceed with Caution
Thanks! Tentative rewrite for the next revision:
It harms our credibility in ways that can make it harder to mitigate present-day harms from NLP deployments. It also limits our ability to prepare for the potentially enormous impacts of more distant future advances.
I tried to stick to ‘present-day’ over ‘short-term’, but missed this old bit of draft text in the abstract.

Sam Bowman 19 Oct 2021 13:49 UTC
LW: 1 AF: 1
AF
in reply to: Pattern’s comment on: NLP Position Paper: When Combatting Hype, Proceed with Caution
Forum

I can see the comment at the comment-specific AF permalink here:
https://www.alignmentforum.org/posts/RLHkSBQ7zmTzAjsio/nlp-position-paper-when-combatting-hype-proceed-with-caution?commentId=pSkdAanZQwyT4Xyit#pSkdAanZQwyT4Xyit

...but I can’t see it among the comments at the base post URL here.
https://www.alignmentforum.org/posts/RLHkSBQ7zmTzAjsio/nlp-position-paper-when-combatting-hype-proceed-with-caution
From my experience with the previous comment, I expect it’ll appear at the latter URL once I reply?
[Old technique] had [problem]...
Ah, got it. That makes sense! I’ll plan to say a bit more about when/how it makes sense to cite older evidence in cases like this.

Sam Bowman 19 Oct 2021 14:13 UTC
LW: 3 AF: 2
AF
in reply to: nostalgebraist’s comment on: NLP Position Paper: When Combatting Hype, Proceed with Caution
Yeah, this all sounds right, and it’s fairly close to the narrative I was using for my previous draft, which had a section on some of these motives.
The best defense I can give of the switch to the hype-centric framing, FWIW:
- The paper is inevitably going to have to do a lot of chastising of authors. Giving the most charitable possible framing of the motivations of the authors I’m chastising means that I’m less likely to lose the trust/readership of those authors and anyone who identifies with them.
- An increasingly large fraction of NLP work—possibly even a majority now—is on the analysis/probing/datasets side rather than model development, and your incentives 1 and 2 don’t apply as neatly there. There are still incentives to underclaim, but they work differently.
- Practically, writing up that version clearly seemed to require a good deal more space, in an already long-by-ML-standards paper.
That said, I agree that this framing is a little bit too charitable, to the point of making implausible implications about some of these authors’ motives in some cases, which isn’t a good look. I also hadn’t thought of the wasted effort point, which seems quite useful here. I’m giving a few talks about this over the next few weeks, and I’ll workshop some tweaks to the framing with this in mind.

Sam Bowman 22 Mar 2022 22:31 UTC
LW: 4 AF: 3
AF
on: ELK contest submission: route understanding through the human ontology
- Can be addressed by regularizing the reporter’s output: penalizing response length or entropy and a GAN-type penalty for non-human-like questions and answers.
Can you say more about how this would work? I haven’t been following the literature on emergent communication too closely, but my rough impression had been that steganography in cases like this doesn’t have simple/trivial solutions.

Sam Bowman 5 Apr 2022 19:57 UTC
10 points
in reply to: Søren Elverlin’s comment on: Google’s new 540 billion parameter language model
The BIG-Bench paper that those ‘human’ numbers are coming from (unpublished, quasi-public as TeX here) cautions against taking those average very seriously, without giving complete details about who the humans are or how they were asked/incentivized to behave on tasks that required specialized skills:

Sam Bowman 8 Apr 2022 20:48 UTC
7 points
in reply to: Xodarap’s comment on: It’s time for EA leadership to pull the fast-takeoff fire alarm.
I suspect that these developments look a bit less surprising if you’ve been trying to forecast progress here, and so might be at least partially priced in. Anyhow, the forecast you linked to shows >10% likelihood before spring 2025, three years from now. That’s extraordinarily aggressive compared to (implied) conventional wisdom, and probably a little more aggressive than I’d be as an EA AI prof with an interest in language models and scaling laws.

A Small Negative Result on Debate

Sam Bowman12 Apr 2022 18:19 UTC

42 points

11 comments1 min readLW link

Sam Bowman 12 Apr 2022 19:12 UTC
LW: 1 AF: 1
AF
in reply to: TLW’s comment on: A Small Negative Result on Debate
One of the arguments is quite misleading in most cases, so probably not high-quality by typical definitions. Unfortunately, under the time limit, our readers can’t reliably tell which one is misleading.
Without arguments and without the time limit, annotators get the questions right with ~90% accuracy: https://arxiv.org/abs/2112.08608

Sam Bowman 13 Apr 2022 16:26 UTC
LW: 2 AF: 2
AF
in reply to: Beth Barnes’s comment on: A Small Negative Result on Debate
Yep. (Thanks for re-posting.) We’re pretty resigned to the conclusion that debate fails to reach a correct conclusion in at least some non-trivial cases—we’re mainly interested in figuring out (i) whether there are significant domains or families of questions for which it will often reach a conclusion, and (ii) whether it tends to fail gracefully (i.e., every outcome is either correct or a draw).

Sam Bowman 13 Apr 2022 16:35 UTC
LW: 1 AF: 1
AF
in reply to: A Ray’s comment on: A Small Negative Result on Debate
I have no reason to be especially optimistic given these results, but I suppose there may be some fairly simple questions for which it’s possible to enumerate a complete argument in a way that flaws will be clearly apparent.
In general, it seems like single-turn debate would have to rely on an extremely careful judge, which we don’t quite have, given the time constraint. Multi-turn seems likely to be more forgiving, especially if the judge has any influence over the course of the debate.

Sam Bowman 13 Apr 2022 16:37 UTC
LW: 1 AF: 1
AF
in reply to: TLW’s comment on: A Small Negative Result on Debate
I can look up the exact wording if it’s helpful, but I assume it’s clear from the basic setup that at least one of the arguments has to be misleading.

Sam Bowman 21 Apr 2022 15:22 UTC
LW: 9 AF: 5
AF
on: New Scaling Laws for Large Language Models
Is anyone working on updating the Biological Anchors Report model based on the updated slopes/requirements here?

Jobs: Help scale up LM alignment research at NYU

Sam Bowman9 May 2022 14:12 UTC

60 points

1 comment1 min readLW link

Artificial Sandwiching: When can we test scalable alignment protocols without humans?

Sam Bowman13 Jul 2022 21:14 UTC

42 points

6 comments5 min readLW link

Sam Bowman 14 Jul 2022 23:32 UTC
LW: 1 AF: 1
0
AF
in reply to: Jérémy Scheurer’s comment on: Artificial Sandwiching: When can we test scalable alignment protocols without humans?
Thanks! I think I have some sense of what both directions look like, but not enough to know what a concrete starting experiment would look like. What would a minimum viable experiment look like for each?

Sam Bowman 28 Jul 2022 22:39 UTC
LW: 1 AF: 1
0
AF
in reply to: Jérémy Scheurer’s comment on: Artificial Sandwiching: When can we test scalable alignment protocols without humans?
Thanks! I’ll admit that I meant to be asking especially about the toxicity case, though I didn’t make that at all clear. As in Charlie’s comment, I’m most interested in using this approach as a way to efficiently explore and pilot techniques that we can ultimately adapt back to humans, and text-based interactions seems like a good starting point for that kind of work.
I don’t see a clear picture either way on whether the noisy signal story presents a hard problem that’s distinctively alignment oriented.

Sam Bowman

NLP Po­si­tion Paper: When Com­bat­ting Hype, Pro­ceed with Caution

A Small Nega­tive Re­sult on Debate

Jobs: Help scale up LM al­ign­ment re­search at NYU

Ar­tifi­cial Sand­wich­ing: When can we test scal­able al­ign­ment pro­to­cols with­out hu­mans?

NLP Position Paper: When Combatting Hype, Proceed with Caution

A Small Negative Result on Debate

Jobs: Help scale up LM alignment research at NYU

Artificial Sandwiching: When can we test scalable alignment protocols without humans?