Tom Lieberum

Karma: 967

Research Engineer at DeepMind, focused on mechanistic interpretability and large language models. Opinions are my own.

Tom Lieberum Jul 15, 2022, 7:10 PM
5 points
1
in reply to: So8res’s comment on: PSA about differential technological development
Thanks for elaborating! In so far your assessment is based on in-person interactions, I can’t really comment since I haven’t spoken much with people from Anthropic.

I think there are degrees to believing this meme you refer to, in the sense of “we need an AI of capability level X to learn meaningful things”. And I would guess that many people at Anthropic do believe this weaker version—it’s their stated purpose after all. And for some values of X this statement is clearly true, e.g. learned filters by shallow CNNs trained on MNIST are not interpretable, wheras the filters of deep Inception-style CNNs trained on ImageNet are (mostly) interpretable.

One could argue that parts of interpretabillity do need to happen in a serial manner, e.g. finding out the best way to interpret transformers at all, the recent SoLU finding, or just generally building up knowledge on how to best formalize or go about this whole interpretability business. If that is true, and furthermore interpretability turns out to be an important component in promising alignment proposals, then the question is mostly about what level of X gives you the most information to advance the serial interpretability research in terms of how much other serial budget you burn.

I don’t know whether people at Anthropic believe the above steps or have thought about it in these ways at all but if they did this could possibly explain the difference in policies between you and them?

Tom Lieberum Jul 15, 2022, 1:29 PM
21 points
2
in reply to: lc’s comment on: PSA about differential technological development
I’d also be interested in hearing which parts of Anthropic’s research output you think burns our serial time budget. If I understood the post correctly, then OP thinks that efforts like transformer circuits are mostly about accelerating parallelizable research.

Maybe OP thinks that
- mechanistic interpretability does have little value in terms of serial research
- RLHF does not give us alignment (because it doesn’t generalize beyond the “sharp left turn” which OP thinks is likely to happen)
- therefore, since most of Anthropic’s alignment focused output has not much value in terms of serial research, and it does somewhat enhance present-day LLM capabilities/usability, it is net negative?
But I’m very much unsure whether OP really believes this—would love to hear him elaborate.

ETA: It could also be the case that OP was exclusively referring to the part of Anthropic that is about training LLMs efficiently as a pre-requisite to study those models?

Tom Lieberum Jun 23, 2022, 4:34 AM
1 point
in reply to: ponkaloupe’s comment on: How do I use caffeine optimally?
Yep all good points. I think I didn’t emphasize enough that you should not take it every day (maybe not even every other day).

The gums are less addictive than cigs because they taste bad and because the feedback/reinforcement is slower. Lozenges sound like a good alternative too, to be extra sure.

Tom Lieberum Jun 22, 2022, 6:46 PM
6 points
on: How do I use caffeine optimally?
I wouldn’t recommend regular caffeine at all unless you know from experience that you won’t develop a physical dependency. In my experience you get more like short term gain until your body adapts then requires coffee to function normally.

If you do want to try caffeine I recommend trying to pair it with L-theanine (either in pills or green tea) which is supposed to smooth the experience and makes for a cleaner high (YMMV).

If you’re looking for a stimulant that you don’t take regularly and with shorter half life, consider nicotine gums. Again ymmv, I think gwern has tried it with little effect. Beware the addictive potential (although lower than with cigarettes or vapes)

Investigating causal understanding in LLMs

Marius Hobbhahn and Tom Lieberum

Jun 14, 2022, 1:57 PM

28 points

6 comments13 min readLW link

Thoughts on Formalizing Composition

Tom LieberumJun 7, 2022, 7:51 AM

13 points

0 comments7 min readLW link

Tom Lieberum May 26, 2022, 1:02 PM
1 point
in reply to: Charlie Steiner’s comment on: CNN feature visualization in 50 lines of code
On priors, I wouldn’t worry too much about c), since I would expect a ‘super stimulus’ for head A to not be a super stimulus for head B.

I think one of the problems is the discrete input space, i.e. how do you parameterize sequence that is being optimized?

One idea I just had was trying to fine-tune an LLM with a reward signal given by for example the magnitude of the residual delta coming from a particular head (we probably something else here, maybe net logit change?). The LLM then already encodes a prior over “sensible” sequences and will try to find one of those which activates the head strongly (however we want to operationalize that).

Tom Lieberum May 26, 2022, 12:43 PM
2 points
on: CNN feature visualization in 50 lines of code
Very cool to see new people joining the interpretability field!

Some resource suggestions:

If you didn’t know already, there is a TF2 port of Lucid, called Luna:

There is also Lucent, which is Lucid for PyTorch: (Some docs written by me for a slightly different version)

For transformer interpretability you might want to check out Anthropic’s work on transformer circuits, Redwood Research’s interpretability tool, or (shameless plug) Unseal.

Tom Lieberum May 13, 2022, 2:11 PM
3 points
AF
in reply to: Yair Halberstadt’s comment on: DeepMind is hiring for the Scalable Alignment and Alignment Teams
I can’t speak to the option for remote work but as a counterpoint, it seems very straightforward to get a UK visa for you and your spouse/children (at least straightforward relative to the US). The relevant visa to google is the Skilled Worker / Tier 2 visa if you want to know more.

ETA: Of course, there are still legitimate reasons for not wanting to move. Just wanted to point out that the legal barrier is lower than you might think.

Tom Lieberum Apr 25, 2022, 9:38 AM
3 points
in reply to: Hoagy’s comment on: Hoagy’s Shortform
There is definitely something out there, just can’t recall the name. A keyword you might want to look for is “disentangled representations”.

One start would be the beta-VAE paper https://openreview.net/forum?id=Sy2fzU9gl

Tom Lieberum Apr 1, 2022, 9:16 AM
2 points
in reply to: Chris_Leong’s comment on: Replacing Karma with Good Heart Tokens (Worth $1!)
Considering you get at least one free upvote from posting/commenting itself, you just have to be faster than the downvoters to generate money :P

Tom Lieberum Mar 31, 2022, 1:08 PM
1 point
on: Gears-Level Mental Models of Transformer Interpretability
Small nitpick:

The PCA plot is using the smallest version of GPT2, and not the 1.5B parameter model (that would be GPT2-XL). The small model is significantly worse than the large one and so I would be hesitant to draw conclusions from that experiment alone.

Tom Lieberum Mar 31, 2022, 9:27 AM
7 points
in reply to: Space L Clottey’s comment on: Do a cost-benefit analysis of your technology usage
I want to second your first point. Texting frequently with significant others lets me feel be part of their life and vice versa which a weekly call does not accomplish, partly because it is weekly and partly because I am pretty averse to calls.
In one relationship I had, this led to significant misery on my part because my partner was pretty strict on their phone usage, batching messages for the mornings and evenings. For my current primary relationship, I’m convinced that the frequent texting is what kept it alive while being long-distance.
To reconcile the two viewpoints, I think it is still true that superficial relationships via social media likes or retweets are not worth that much if they are all there is to the relationship. But direct text messages are a significant improvement on that.
Re your blog post:
Maybe that’s me being introverted but there are probably significant differences in whether people feel comfortable/like texting or calling. For me, the instantaneousness of calling makes it much more stressful, and I do have a problem with people generalizing either way that one way to interact over distances is superior in general. I do cede the point that calling is of course much higher bandwidth, but it also requires more time commitment and coordination.

Tom Lieberum Feb 12, 2022, 10:57 AM
1 point
in reply to: gwern’s comment on: Hypothesis: gradient descent prefers general circuits
I tried increasing weight decay and increased batch sizes but so far no real success compared to 5x lr. Not going to investigate this further atm.

Tom Lieberum Feb 11, 2022, 1:45 PM
LW: 3 AF: 2
AF
in reply to: Rohin Shah’s comment on: Hypothesis: gradient descent prefers general circuits
Oh I thought figure 1 was S5 but it actually is modular division. I’ll give that a go..

Here are results for modular division. Not super sure what to make of them. Small increases in learning rate work, but so does just choosing a larger learning rate from the beginning. In fact, increasing lr to 5x from the beginning works super well but switching to 5x once grokking arguably starts just destroys any progress. 10x lr from the start does not work (nor when switching later)

So maybe the initial observation is more a general/global property of the loss landscape for the task and not of the particular region during grokking?

Tom Lieberum Feb 11, 2022, 12:21 PM
LW: 3 AF: 3
AF
in reply to: Rohin Shah’s comment on: Hypothesis: gradient descent prefers general circuits
So I ran some experiments for the permutation group S_5 with the task x o y = ?

Interestingly here increasing the learning rate just never works. I’m very confused.

Tom Lieberum Feb 11, 2022, 10:50 AM
3 points
AF
in reply to: Rohin Shah’s comment on: Hypothesis: gradient descent prefers general circuits
I updated the report with the training curves. Under default settings, 100% training accuracy is reached after 500 steps.

There is actually an overlap between the train/val curves going up. Might be an artifact of the simplicity of the task or that I didn’t properly split the dataset (e.g. x+y being in train and y+x being in val). I might run it again for a harder task to verify.

Tom Lieberum Feb 11, 2022, 10:40 AM
3 points
AF
in reply to: Rohin Shah’s comment on: Hypothesis: gradient descent prefers general circuits
Yep I used my own re-implementation, which somehow has slightly different behavior.

I’ll also note that the task in the report is modular addition while figure 1 from the paper (the one with the red and green lines for train/val) is the significantly harder permutation group task.

Tom Lieberum Feb 11, 2022, 8:37 AM
LW: 3 AF: 2
AF
in reply to: Rohin Shah’s comment on: Hypothesis: gradient descent prefers general circuits
I’m not sure I understand.

I chose the grokking starting point as 300 steps, based on the yellow plot. I’d say it’s reasonable to say that ‘grokking is complete’ by the 2000 step mark in the default setting, whereas it is complete by the 450 step mark in the 10x setting (assuming appropriate LR decay to avoid overshooting). Also note that the plots in the report are not log-scale

Tom Lieberum Feb 10, 2022, 8:03 PM
LW: 27 AF: 17
AF
in reply to: Rohin Shah’s comment on: Hypothesis: gradient descent prefers general circuits

It would be interesting to see if, once grokking had clearly started, you could just 100x the learning rate and speed up the convergence to zero validation loss by 100x.

I ran a quick-and-dirty experiment and it does in fact look like you can just crank up the learning rate at the point where some part of grokking happens to speed up convergence significantly. See the wandb report:

https://wandb.ai/tomfrederik/interpreting_grokking/reports/Increasing-Learning-Rate-at-Grokking—VmlldzoxNTQ2ODY2?accessToken=y3f00qfxot60n709pu8d049wgci69g53pki6pq6khsemnncca1dnmocu7a3d43y8

I set the LR to 5x the normal value (100x tanked the accuracy, 10x still works though). Of course you would want to anneal it after grokking was finished.

Tom Lieberum

In­ves­ti­gat­ing causal un­der­stand­ing in LLMs

Thoughts on For­mal­iz­ing Composition

Investigating causal understanding in LLMs

Thoughts on Formalizing Composition