RSS

MiguelDev

Karma: 293

A legacy worth creating: help avoid catastrophic AI failures...

Ethically aligned GPT2XL Prototypes using RLLM:

  1. RLLMv3 - demonstrated robustness to jailbreaks. More info here.

  2. RLLMv10 - A variant of RLLMv3 worth including here. I wrote some intuitions regarding this experiment and you can read it here.

  3. RLLMv1 - first prototype, unbelievably slow and too addicted with ethical alignment. More info here.

(Note: These models are running on the free tier of 2GB RAM in hugging face which makes them very slow. In case you want to test a GPT2XL base model, click this link.)

Misaligned Prototypes:

  1. Paperclip-Todd: An AI named petertodd that turns everything into paperclips. Rough blog post here.

  2. Staple-Todd: An AI named petertodd that turns everything into staples.