Tom Lieberum

Karma: 808

Research Engineer at DeepMind, focused on mechanistic interpretability and large language models. Opinions are my own.

Tom Lieberum 26 Sep 2019 21:38 UTC
LW: 3 AF: 1
AF
in reply to: Brendan Heisler’s comment on: Deducing Impact
While I agree that using percentages would make impact more comparable between agents and timesteps, it also leads to counterintuitive results (at least counterintuitive to me)
Consider a sequence of utilities at times 0, 1, 2 with $U_{0} = 1$ , $U_{1} = 0.01$ and $U_{2} = 0$ .
Now the drop from $U_{1}$ to $U_{2}$ would be more dramatic (decrease by 100%) compared to the drop from $U_{0}$ to $U_{1}$ (decrease by 99%) if we were using percentages. But I think the agent should ‘care more’ about the larger drop in absolute utility (i.e. spend more resources to prevent it from happening) and I suppose we might want to let impact correspond to something like ‘how much we care about this event happening’.

[Question] How should my timelines influence my career choice?

Tom Lieberum3 Aug 2021 10:14 UTC

13 points

10 comments1 min readLW link

Tom Lieberum 4 Aug 2021 1:03 UTC
3 points
in reply to: Rafael Harth’s comment on: How should my timelines influence my career choice?
Thanks! Yeah, that’s a good point mulling over. I guess it would hinge on the marginal improvement of EV in the doomed scenario to make that assertion. I don’t necessarily see things being completely, hopelessly doomed in a 15-20-year-to-AGI world. But I am also uncertain as to which role is more useful in the short-timeline world, aside from an engineer being able to contribute earlier. In the medium-term timeline world it seems to me like the marginal researcher has higher EV.

So if I would be completely uncertain, i.e. ⁵⁰⁄₅₀, which one is better in a short timeline world, then becoming a researcher would seem like the safer choice.

Tom Lieberum 25 Sep 2021 19:41 UTC
2 points
in reply to: Tomás B.’s comment on: Stupid Questions Thread
Background: AI Master Student, some practice in RL

I don’t think there is a fundamental reason that we can’t but it’s rather that no one did it. I don’t know a definitive answer as to why but here are some options:
- too obscure (‘no-one has thought of it’, or ‘no-one thought it was a good idea, it’s only assembly-like code after all’)
- high barrier of entry (you need to write an RL environment that you can query fast, and you need a lot of compute)
  - this makes it harder for individuals or small teams to do this, and larger players like DeepMind and OpenAI might have different priorities
- now that we have Codex (and soon^TM its newer, allegedly much better version), there might not be any (economic or scientific) reason to do this
- What’s the value you’d get out of a code wars expert model? I glanced at the Wikipedia page. How would you convert its outputs to a useful program, that does more than gobble up all your memory?

Tom Lieberum 24 Oct 2021 11:14 UTC
1 point
on: My ML Scaling bibliography
Great stuff, thanks!

Is there are reason you’re not including https://deepmind.com/blog/article/generally-capable-agents-emerge-from-open-ended-play? Is it not explicit enough in terms of being a scaling paper?

Tom Lieberum 24 Nov 2021 12:56 UTC
−6 points
in reply to: ChristianKl’s comment on: AI Safety Needs Great Engineers
I don’t like your framing of this as “plausible” but I don’t want to argue that point.

Afaict it boils down to whether you believe in (parts of) their mission, e.g. interpretability of large models and how much that weighs against the marginal increase in race dynamics if any.

Tom Lieberum 24 Nov 2021 15:58 UTC
3 points
in reply to: ChristianKl’s comment on: AI Safety Needs Great Engineers
No I take specific issue with the term ‘plausibly’. I don’t have a problem with the term ‘possibly’. Using the term plausibly already presumes judgement over the outcome of the discussion which I did not want to get into (mostly because I don’t have a strong view on this yet). You could of course argue that that’s false balance and if so I would like to hear your argument (but maybe not under this particular post, if people think that it’s too OT)
ETA: if this is just a disagreement about our definitions of the term ‘plausibly’ then nevermind, but your original comment reads to me like you’re taking a side.

Tom Lieberum 24 Nov 2021 21:35 UTC
8 points
in reply to: ChristianKl’s comment on: AI Safety Needs Great Engineers
Oh yes I’m aware that he expressed this view. That’s different however from it being objectively plausible (whatever that means). I have the feeling we’re talking past each other a bit. I’m not saying “no-one reputable thinks OpenAI is net-negative for the world”. I’m just pointing out that it’s not as clear-cut as your initial comment made it seem to me.

Tom Lieberum 26 Nov 2021 17:35 UTC
2 points
in reply to: Jsevillamol’s comment on: EfficientZero: How It Works
In appendix A.6 they state “To train an Atari agent for 100k steps, it only needs 4 GPUs to train 7 hours.” I don’t think they provide a summary of the total number of parameters. Scanning the described architecture though, it does not look like a lot—almost surely < 1B.

Tom Lieberum 18 Dec 2021 8:38 UTC
17 points
on: Perishable Knowledge
I like the framing of perishable vs non-perishable knowledge and I like that the post is short and concise.

However, after reading this I’m left feeling “So what now?” and would appreciate some more actionable advice or tools of thought. What I got out so far is:
1. Things that have been around for longer are more likely to stay around longer (seems like a decent prior)
2. Keep tabs on a few major event categories and dump the rest of the news cycle (checks out—not sure how that would work as a categorical imperative, but seems like the right choice for an individual)
I think the concept can be applied pretty broadly. Some more ideas:
- when learning about a new field, in general, go for textbooks rather than papers
- if you use spaced repetition, regularly ask yourself whether the cards you are studying have passed their shelf life --> this can help reduce frustration/annoyance/boredom when reviewing cards
- some skills have extremely long shelf-life and they seem to overlap with those that compound:
  - learning basic life admin skills
  - learning how to take care of your mental health (e.g. CBT methods)
  - learning how to learn
  - basic social skills
I’m sure there is much more here.
What links here?
- Here’s a List of Some of My Ideas for Blog Posts by lsusr (26 May 2022 5:35 UTC; 48 points)

Tom Lieberum 18 Dec 2021 8:46 UTC
7 points
in reply to: lberglund’s comment on: The Case for Radical Optimism about Interpretability
I can only speculate, but the main researchers are now working on other stuff, like e.g. Anthropic. As to why they switched, I don’t know. Maybe they were not making progress fast enough or Anthropic’s mission seemed more important?
However, at least Chris Olah believes this is still a tractable and important direction, see the recent RFP by him for Open Phil.

Tom Lieberum 18 Dec 2021 10:53 UTC
3 points
on: DL towards the unaligned Recursive Self-Optimization attractor
Small nitpick: I would cite The Bitter Lesson in the beginning.

Tom Lieberum 18 Dec 2021 11:09 UTC
2 points
on: DL towards the unaligned Recursive Self-Optimization attractor
Interpretability will fail—future DL descendant is more of a black box, not less
It certainly makes interpretability harder, but it seems like the possible gain is also larger, making it a riskier bet overall. I’m not convinced that it decreases the expected value of interpretability research though. Do you have a good intuition for why it would make interpretability less valuable or at least lower value compared to the increased risk of failure?
IRL/Value Learning is far more difficult than first appearances suggest, see #2
That’s not immediately clear to me. Could you elaborate?

Tom Lieberum 18 Dec 2021 15:47 UTC
3 points
in reply to: Lech Mazur’s comment on: DL towards the unaligned Recursive Self-Optimization attractor
Out of curiosity, are you willing to share the papers you improved upon?

Tom Lieberum 18 Dec 2021 15:52 UTC
1 point
in reply to: Quintin Pope’s comment on: DL towards the unaligned Recursive Self-Optimization attractor
Trying to summarize your viewpoint, lmk if I’m missing something important:
- Training self-organizing models on multi-modal input will lead to increased modularization and in turn to more interpretability
- Existing interpretability techniques might more or less transfer to self-organizing systems
- There are low-hanging fruits in applied interpretability that we could exploit should we need them in order to understand self-organizing systems
- (Not going into the specific proposals for sake of brevity and clarity)

Tom Lieberum 18 Dec 2021 15:57 UTC
1 point
on: Important ML systems from before 2012?
Are you asking exclusively about “Machine Learning” systems or also GOFAI? E.g. I notice that you didn’t include ELIZA in your database, but that was a hard coded program so maybe doesn’t match your criteria.

Tom Lieberum 19 Dec 2021 11:33 UTC
11 points
in reply to: Yair Halberstadt’s comment on: Should I delay having children to take advantage of polygenic screening?
Seems like this could be circumvented relatively easily by freezing gametes now.

Understanding the tensor product formulation in Transformer Circuits

Tom Lieberum24 Dec 2021 18:05 UTC

16 points

2 comments3 min readLW link

Tom Lieberum 28 Dec 2021 14:12 UTC
1 point
in reply to: cousin_it’s comment on: Understanding the tensor product formulation in Transformer Circuits
Ah yes that makes sense to me. I’ll modify the post accordingly and probably write it in the basis formulation.

ETA: Fixed now, computation takes a tiny bit longer but hopefully still readable to everyone.

Tom Lieberum 10 Feb 2022 20:03 UTC
LW: 27 AF: 17
AF
in reply to: Rohin Shah’s comment on: Hypothesis: gradient descent prefers general circuits

It would be interesting to see if, once grokking had clearly started, you could just 100x the learning rate and speed up the convergence to zero validation loss by 100x.

I ran a quick-and-dirty experiment and it does in fact look like you can just crank up the learning rate at the point where some part of grokking happens to speed up convergence significantly. See the wandb report:

https://wandb.ai/tomfrederik/interpreting_grokking/reports/Increasing-Learning-Rate-at-Grokking—VmlldzoxNTQ2ODY2?accessToken=y3f00qfxot60n709pu8d049wgci69g53pki6pq6khsemnncca1dnmocu7a3d43y8

I set the LR to 5x the normal value (100x tanked the accuracy, 10x still works though). Of course you would want to anneal it after grokking was finished.

Tom Lieberum

[Question] How should my timelines in­fluence my ca­reer choice?

Un­der­stand­ing the ten­sor product for­mu­la­tion in Trans­former Circuits

[Question] How should my timelines influence my career choice?

Understanding the tensor product formulation in Transformer Circuits