Adam Scherlis 1 Feb 2023 8:42 UTC
48 points
6
on: Adam Scherlis’s Shortform
EDIT: I originally saw this in Janus’s tweet here: https://twitter.com/repligate/status/1619557173352370186
Something fun I just found out about: ChatGPT perceives the phrase ” SolidGoldMagikarp” (with an initial space) as the word “distribute”, and will respond accordingly. It is completely unaware that that’s not what you typed.

This happens because the BPE tokenizer saw the string ” SolidGoldMagikarp” a few times in its training corpus, so it added a dedicated token for it, but that string almost never appeared in ChatGPT’s own training data so it never learned to do anything with it. Instead, it’s just a weird blind spot in its understanding of text.

A brainteaser for language models

Adam Scherlis12 Dec 2022 2:43 UTC

47 points

3 comments2 min readLW link

Two Percolation Puzzles

Adam Scherlis4 Jul 2023 5:34 UTC

43 points

14 comments1 min readLW link

(adam.scherlis.com)

An exploration of GPT-2′s embedding weights

Adam Scherlis13 Dec 2022 0:46 UTC

41 points

4 comments10 min readLW link

Five views of Bayes’ Theorem

Adam Scherlis2 Jul 2022 2:25 UTC

38 points

4 comments1 min readLW link

Gradient Hacking via Schelling Goals

Adam Scherlis28 Dec 2021 20:38 UTC

33 points

4 comments4 min readLW link

Understanding the two-head strategy for teaching ML to answer questions honestly

Adam Scherlis11 Jan 2022 23:24 UTC

29 points

1 comment10 min readLW link

Adam Scherlis 26 Nov 2022 2:45 UTC
25 points
14
on: New Frontiers in Mojibake
♀︎

Fun fact: usually this is U+2640, but in this post it’s U+2640 U+FE0E, where U+FE0E is a control character meaning “that was text, not emoji, btw”. That should be redundant here, but LessWrong is pretty aggressive about replacing emojifiable text with emoji images.

Emoji are really cursed.

Adam Scherlis 25 Oct 2022 19:07 UTC
LW: 20 AF: 12
10
AF
on: Beyond Kolmogorov and Shannon
Sorry if this is a spoiler for your next post, but I take issue with the heading “Standard measures of information theory do not work” and the implication that this post contains the pre-Crutchfield state of the art.

The standard approach to this in information theory (which underlies the loss function of autoregressive LMs) isn’t to try to match the Shannon entropy of the marginal distribution of bits (a 50-50 distribution in your post), it’s to treat the generative model as a distribution for each bit conditional on the previous bits and use the cross-entropy of that distribution under the data distribution as the loss function or measure of goodness of the generative model.

So in this example, “look at the previous bits, identify the current position relative to the 01x01x pattern, and predict 0, 1, or [50-50 distribution] as appropriate” is the best you can do (given sufficient data for the 50-50 proportion to be reasonably accurate) and is indeed an accurate model of the process that generated the data.

We can see the pattern and take the current position into account because the distribution is conditional on previous bits.

Predicting 011011011… doesn’t do as well because cross-entropy penalizes unwarranted overconfidence.

Predicting 50-50 for each bit doesn’t do as well because cross-entropy still cares about successful predictions.

(Formally, cross-entropy is an expectation over the data distribution instead of an empirical average over a bunch of sampled data, but the term is used in both cases in practice. “Log[-likelihood] loss” and “the log scoring rule” are other common terms for the empirical version.)

As I said above, this isn’t just a standard information theory approach to this, it’s actually how GPT-3 and other LLMs were trained.

I’m curious about Crutchfield’s thing, but so far not convinced that standard information theory isn’t adequate in this context.

(I think Kolmogorov complexity is also relevant to LLM interpretability, philosophically if not practically, but that’s beyond the scope of this comment.)

Adam Scherlis 1 Feb 2023 8:48 UTC
16 points
4
in reply to: Adam Scherlis’s comment on: Adam Scherlis’s Shortform

One is (almost) normal in base π

Adam Scherlis1 Jul 2022 4:05 UTC

14 points

0 comments1 min readLW link

(adam.scherlis.com)

Adam Scherlis 20 Oct 2022 23:12 UTC
13 points
12
on: Introduction to abstract entropy
I haven’t read all of this yet. I like it so far. One nitpick: I would really try to avoid referring to individual microstates as having “entropy” assigned to them. I would call $- log (p_{i})$ (or things playing a similar role) something else like “surprisal” or “information”, and reserve entropy (rather than “average entropy”) for things that look like $E_{i} [s u r p r i s a l (i)]$ or $\sum_{i} (p_{i} s u r p r i s a l (i))$ .
Of course, for macrostates/distributions with uniform probability, this works out to be equal to $s u r p r i s a l (i)$ for every state in the macrostate, but I think the conceptual distinction is important.
(I’m as guilty as anyone of calling simple microstates “low-entropy”, but I think it’s healthier to reserve that for macrostates or distributions.)

Adam Scherlis 16 Feb 2023 2:16 UTC
11 points
5
in reply to: evhub’s comment on: Bing Chat is blatantly, aggressively misaligned
In addition to RLHF or other finetuning, there’s also the prompt prefix (“rules”) that the model is fed at runtime, which has been extracted via prompt injection as noted above. This seems to be clearly responsible for some weird things the bot says, like “confidential and permanent”. It might also be affecting the repetitiveness (because it’s in a fairly repetitive format) and the aggression (because of instructions to resist attempts at “manipulating” it).

I also suspect that there’s some finetuning or prompting for chain-of-thought responses, possibly crudely done, leading to all the “X because Y. Y because Z.” output.

Adam Scherlis 29 Nov 2021 2:34 UTC
11 points
in reply to: shminux’s comment on: Why Study Physics?
I’m not so sure. I think a lot of physicists get better at this through practice, maybe especially in undergrad. I have a PhD in physics, and at this point I think I’m really good at figuring out the appropriate level of abstraction to use on something (something I’d put in the same category as the things mentioned in the OP.) I don’t totally trust my own recollection, but I think I was worse at this freshman year, and much more likely to pick e.g. continuum vs. discrete models of things in mechanics inappropriately and make life hard for myself.

A Generalization of ROC AUC for Binary Classifiers

Adam Scherlis4 Dec 2021 21:47 UTC

10 points

0 comments2 min readLW link

(adam.scherlis.com)