James Chua

Karma: 49

James Chua 11 Nov 2022 13:25 UTC
2 points
1
in reply to: paulfchristiano’s comment on: Mysteries of mode collapse due to RLHF
I do agree think there are two product use cases with instruct that have distinct optimal levels of entropy.
1. The more explorative use cases you have mentioned. And for example when users do want diversity e.g. generating story ideas
2. Having factual / accurate answers
I’m not sure how exactly OpenAI set their “KL budgets” for davinci instruct.
For WebGPT3 they “compared a couple of KL budgets using human evaluations”. And those evaluations were for how factual the answers were.
So in that scenario, we’ll see a KL budget that optimizes for 2. Since the users don’t care about the diversity of multiple generations. They just care about the factual quality of a single generation.
Now i’m interested to see what happens if we somehow change the evaluations such that users e.g. are shown 3 examples from each model. In a scenario where diversity is desirable (e.g. generating story ideas). Now in deciding for the KL budget, we will probably get a much lower number. And that will allow them to serve a model more suited to tasks 1.

A library for safety research in conditioning on RLHF tasks

James Chua26 Feb 2023 14:50 UTC

10 points

2 comments1 min readLW link

James Chua 26 Feb 2023 16:52 UTC
2 points
0
in reply to: Charlie Steiner’s comment on: A library for safety research in conditioning on RLHF tasks
For DTs its really just a linear function to convert the scalar reward into the same dimmensions the token embeddings.
So e.g. a single token’s embedding has a hidden state of size 1024 .
We can learn a linear function that takes this scalar and outputs something of size 1024.
The more annoying (PITA) part was offset the positional/attention masks/labels for this.

James Chua 13 Apr 2023 15:42 UTC
3 points
0
on: SERI MATS—Summer 2023 Cohort
Clicking on Owain Evans in the application doesn’t show the mentor’s questions, unlike the rest of the mentors. I think this is a bug?

James Chua 19 Apr 2023 7:05 UTC
2 points
0
in reply to: Aris’s comment on: SERI MATS—Summer 2023 Cohort
thank you. if I am done with one of the mentors questions, but still am writing the response for another, should I submit the first mentor’s questions first? or is it better for administrative purposes to wait until I am ready for both, and submit them in the same form?

[ ]
[deleted]

James Chua 17 Jun 2023 15:23 UTC
15 points
0
on: Steering GPT-2-XL by adding an activation vector
I managed to get it working for llama-7b on colab after some debugging.
Suprising, it actually does work for the Love / Hate scenario. But not some others like Rome vs Paris.
Heres the link i anyone wants to try it.
https://colab.research.google.com/drive/1ACAA7FO8zc4pFAqPdaPshoy4WWXCvUTQ?usp=sharing
edit: seems like you guys already have a better version here. https://github.com/UlisseMini/activation_additions_hf/blob/main/notebooks/qualitative.ipynb
nevermind! (I’m still keeping this comment for visiblity if anyone wants to try)

James Chua 19 Jun 2023 13:12 UTC
1 point
0
in reply to: Ulisse Mini’s comment on: Steering GPT-2-XL by adding an activation vector
Yep! I was very pleasantly surprised that Love/Hate worked for Llama at all. It’s great that you rewrote it without transformer lens too—as transformer lens has issues with 8 bit / 4 bit quantisation.
Also send you a dm on discord! I’ll be interested to read any rough findings and lessons you have with llama

My MATS Summer 2023 experience

James Chua20 Mar 2024 11:26 UTC

26 points

0 comments3 min readLW link

(jameschua.net)