Archety­pal Trans­fer Learn­ing

TagLast edit: 6 Jun 2023 8:25 UTC by MiguelDev

Archetypal Transfer Learning (ATL) is a proposal by @whitehatStoic for what is argued by the author to be a fine tuning approach that “uses archetypal data” to “embed Synthetic Archetypes”. These Synthetic Archetypes are derived from patterns that models assimilate from archetypal data, such as artificial stories. The method yielded a shutdown activation rate of 38.6% allowing GPT2-medium to shutdown itself 386 times in 1,000 tries in the event its intelligence exceeded that of humans.

The team, consisting of @MiguelDev, @marc/​er, Mazianni and @Linda Linsefors is working to improve the build to a 100%. The project proposal is found here.

Related Tags: Corrigibility, Inner Alignment, Outer Alignment

Re­search pro­posal: Lev­er­ag­ing Jun­gian archetypes to cre­ate val­ues-based models

MiguelDev5 Mar 2023 17:39 UTC
5 points
2 comments2 min readLW link

Archety­pal Trans­fer Learn­ing: a Pro­posed Align­ment Solu­tion that solves the In­ner & Outer Align­ment Prob­lem while adding Cor­rigible Traits to GPT-2-medium

MiguelDev26 Apr 2023 1:37 UTC
12 points
5 comments10 min readLW link

(Slightly) Scal­able RLHF Alter­na­tives: A Pro­duc­tive Path for Slow Take­off Wor­lds?

marc/er17 May 2023 11:31 UTC
9 points
3 comments11 min readLW link

Archety­pal Trans­fer Learn­ing and a Cor­rigi­bil­ity-Friendly Op­ti­miza­tion Technique

marc/er4 May 2023 12:15 UTC
12 points
1 comment11 min readLW link

The Guardian Ver­sion 1

MiguelDev18 Apr 2023 21:20 UTC
9 points
3 comments13 min readLW link

Si­mu­lat­ing a pos­si­ble al­ign­ment solu­tion in GPT2-medium us­ing Archety­pal Trans­fer Learning

MiguelDev2 May 2023 15:34 UTC
1 point
2 comments18 min readLW link
No comments.