RSS

MiguelDev

Karma: 293

A legacy worth creating: help avoid catastrophic AI failures...

Ethically aligned GPT2XL Prototypes using RLLM:

  1. RLLMv3 - demonstrated robustness to jailbreaks. More info here.

  2. RLLMv10 - A variant of RLLMv3 worth including here. I wrote some intuitions regarding this experiment and you can read it here.

  3. RLLMv1 - first prototype, unbelievably slow and too addicted with ethical alignment. More info here.

(Note: These models are running on the free tier of 2GB RAM in hugging face which makes them very slow. In case you want to test a GPT2XL base model, click this link.)

Misaligned Prototypes:

  1. Paperclip-Todd: An AI named petertodd that turns everything into paperclips. Rough blog post here.

  2. Staple-Todd: An AI named petertodd that turns everything into staples.

Re­search pro­posal: Lev­er­ag­ing Jun­gian archetypes to cre­ate val­ues-based models

MiguelDev5 Mar 2023 17:39 UTC
5 points
2 comments2 min readLW link

[Question] Why Carl Jung is not pop­u­lar in AI Align­ment Re­search?

MiguelDev17 Mar 2023 23:56 UTC
−3 points
13 comments1 min readLW link

Hu­man­ity’s Lack of Unity Will Lead to AGI Catastrophe

MiguelDev19 Mar 2023 19:18 UTC
3 points
2 comments4 min readLW link