RSS

Transformers

TagLast edit: 24 Feb 2022 11:01 UTC by Vivek Hebbar

How Do In­duc­tion Heads Ac­tu­ally Work in Trans­form­ers With Finite Ca­pac­ity?

Fabien Roger23 Mar 2023 9:09 UTC
23 points
0 comments5 min readLW link

Tracr: Com­piled Trans­form­ers as a Lab­o­ra­tory for In­ter­pretabil­ity | Deep­Mind

DragonGod13 Jan 2023 16:53 UTC
55 points
12 comments1 min readLW link
(arxiv.org)

How fast can we perform a for­ward pass?

jsteinhardt10 Jun 2022 23:30 UTC
53 points
9 comments15 min readLW link
(bounded-regret.ghost.io)

Resi­d­ual stream norms grow ex­po­nen­tially over the for­ward pass

7 May 2023 0:46 UTC
65 points
17 comments11 min readLW link

Con­crete Steps to Get Started in Trans­former Mechanis­tic Interpretability

Neel Nanda25 Dec 2022 22:21 UTC
51 points
7 comments12 min readLW link
(www.neelnanda.io)

Google’s PaLM-E: An Em­bod­ied Mul­ti­modal Lan­guage Model

SandXbox7 Mar 2023 4:11 UTC
86 points
7 comments1 min readLW link
(palm-e.github.io)

No Really, At­ten­tion is ALL You Need—At­ten­tion can do feed­for­ward networks

Robert_AIZI31 Jan 2023 18:48 UTC
23 points
2 comments6 min readLW link
(aizi.substack.com)

Ad­den­dum: More Effi­cient FFNs via Attention

Robert_AIZI6 Feb 2023 18:55 UTC
8 points
0 comments5 min readLW link
(aizi.substack.com)

So, just why do GPTs have to op­er­ate by con­tin­u­ing an ex­ist­ing string?

Bill Benzon24 Mar 2023 12:08 UTC
−4 points
0 comments3 min readLW link

We Need To Know About Con­tinual Learning

michael_mjd22 Apr 2023 17:08 UTC
27 points
14 comments4 min readLW link

An Anal­ogy for Un­der­stand­ing Transformers

TheMcDouglas13 May 2023 12:20 UTC
63 points
5 comments9 min readLW link

Re­search agenda—Build­ing a multi-modal chess-lan­guage model

p.b.7 Apr 2022 12:25 UTC
8 points
2 comments2 min readLW link

Trans­former Ar­chi­tec­ture Choice for Re­sist­ing Prompt In­jec­tion and Jail-Break­ing Attacks

RogerDearnaley21 May 2023 8:29 UTC
6 points
1 comment4 min readLW link

Search­ing for Mo­du­lar­ity in Large Lan­guage Models

8 Sep 2022 2:25 UTC
44 points
3 comments14 min readLW link

Brief Notes on Transformers

Adam Jermyn26 Sep 2022 14:46 UTC
43 points
2 comments2 min readLW link

Build­ing a trans­former from scratch—AI safety up-skil­ling challenge

Marius Hobbhahn12 Oct 2022 15:40 UTC
42 points
1 comment5 min readLW link

[Question] Are Mix­ture-of-Ex­perts Trans­form­ers More In­ter­pretable Than Dense Trans­form­ers?

simeon_c31 Dec 2022 11:34 UTC
7 points
4 comments1 min readLW link
No comments.