RSS

MiguelDev

Karma: 293

A legacy worth creating: help avoid catastrophic AI failures...

Ethically aligned GPT2XL Prototypes using RLLM:

  1. RLLMv3 - demonstrated robustness to jailbreaks. More info here.

  2. RLLMv10 - A variant of RLLMv3 worth including here. I wrote some intuitions regarding this experiment and you can read it here.

  3. RLLMv1 - first prototype, unbelievably slow and too addicted with ethical alignment. More info here.

(Note: These models are running on the free tier of 2GB RAM in hugging face which makes them very slow. In case you want to test a GPT2XL base model, click this link.)

Misaligned Prototypes:

  1. Paperclip-Todd: An AI named petertodd that turns everything into paperclips. Rough blog post here.

  2. Staple-Todd: An AI named petertodd that turns everything into staples.

A T-o-M test: ‘pop­corn’ or ‘choco­late’

MiguelDev8 Mar 2024 4:24 UTC
20 points
13 comments1 min readLW link

[Question] rab­bit (a new AI com­pany) and Large Ac­tion Model (LAM)

MiguelDev10 Jan 2024 13:57 UTC
17 points
3 comments1 min readLW link

GPT2XL_RLLMv3 vs. Bet­terDAN, AI Machi­avelli & Oppo Jailbreaks

MiguelDev11 Feb 2024 11:03 UTC
16 points
4 comments14 min readLW link

Archety­pal Trans­fer Learn­ing: a Pro­posed Align­ment Solu­tion that solves the In­ner & Outer Align­ment Prob­lem while adding Cor­rigible Traits to GPT-2-medium

MiguelDev26 Apr 2023 1:37 UTC
14 points
5 comments10 min readLW link

Rele­vance of ‘Harm­ful In­tel­li­gence’ Data in Train­ing Datasets (We­bText vs. Pile)

MiguelDev12 Oct 2023 12:08 UTC
12 points
0 comments9 min readLW link

<|end­of­text|> is a van­ish­ing text?

MiguelDev16 Sep 2023 2:34 UTC
10 points
0 comments1 min readLW link

Ex­plor­ing Func­tional De­ci­sion The­ory (FDT) and a mod­ified ver­sion (ModFDT)

MiguelDev5 Jul 2023 14:06 UTC
8 points
11 comments15 min readLW link

Can RLLMv3′s abil­ity to defend against jailbreaks be at­tributed to datasets con­tain­ing sto­ries about Jung’s shadow in­te­gra­tion the­ory?

MiguelDev29 Feb 2024 5:13 UTC
7 points
2 comments11 min readLW link

Re­in­force­ment Learn­ing us­ing Lay­ered Mor­phol­ogy (RLLM)

MiguelDev1 Dec 2023 5:18 UTC
7 points
0 comments29 min readLW link