Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Transformers
Tag
Last edit:
24 Feb 2022 11:01 UTC
by
Vivek Hebbar
Relevant
New
Old
How Do Induction Heads Actually Work in Transformers With Finite Capacity?
Fabien Roger
23 Mar 2023 9:09 UTC
23
points
0
comments
5
min read
LW
link
Tracr: Compiled Transformers as a Laboratory for Interpretability | DeepMind
DragonGod
13 Jan 2023 16:53 UTC
55
points
12
comments
1
min read
LW
link
(arxiv.org)
How fast can we perform a forward pass?
jsteinhardt
10 Jun 2022 23:30 UTC
53
points
9
comments
15
min read
LW
link
(bounded-regret.ghost.io)
Residual stream norms grow exponentially over the forward pass
StefanHex
and
TurnTrout
7 May 2023 0:46 UTC
65
points
17
comments
11
min read
LW
link
Concrete Steps to Get Started in Transformer Mechanistic Interpretability
Neel Nanda
25 Dec 2022 22:21 UTC
51
points
7
comments
12
min read
LW
link
(www.neelnanda.io)
Google’s PaLM-E: An Embodied Multimodal Language Model
SandXbox
7 Mar 2023 4:11 UTC
86
points
7
comments
1
min read
LW
link
(palm-e.github.io)
No Really, Attention is ALL You Need—Attention can do feedforward networks
Robert_AIZI
31 Jan 2023 18:48 UTC
23
points
2
comments
6
min read
LW
link
(aizi.substack.com)
Addendum: More Efficient FFNs via Attention
Robert_AIZI
6 Feb 2023 18:55 UTC
8
points
0
comments
5
min read
LW
link
(aizi.substack.com)
So, just why do GPTs have to operate by continuing an existing string?
Bill Benzon
24 Mar 2023 12:08 UTC
−4
points
0
comments
3
min read
LW
link
We Need To Know About Continual Learning
michael_mjd
22 Apr 2023 17:08 UTC
27
points
14
comments
4
min read
LW
link
An Analogy for Understanding Transformers
TheMcDouglas
13 May 2023 12:20 UTC
63
points
5
comments
9
min read
LW
link
Research agenda—Building a multi-modal chess-language model
p.b.
7 Apr 2022 12:25 UTC
8
points
2
comments
2
min read
LW
link
Transformer Architecture Choice for Resisting Prompt Injection and Jail-Breaking Attacks
RogerDearnaley
21 May 2023 8:29 UTC
6
points
1
comment
4
min read
LW
link
Searching for Modularity in Large Language Models
NickyP
and
Stephen Fowler
8 Sep 2022 2:25 UTC
44
points
3
comments
14
min read
LW
link
Brief Notes on Transformers
Adam Jermyn
26 Sep 2022 14:46 UTC
43
points
2
comments
2
min read
LW
link
Building a transformer from scratch—AI safety up-skilling challenge
Marius Hobbhahn
12 Oct 2022 15:40 UTC
42
points
1
comment
5
min read
LW
link
[Question]
Are Mixture-of-Experts Transformers More Interpretable Than Dense Transformers?
simeon_c
31 Dec 2022 11:34 UTC
7
points
4
comments
1
min read
LW
link
No comments.
Back to top