MiguelDev

Karma: 324

Unlocking Ethical AI and Improving Jailbreak Defenses: Reinforcement Learning with Layered Morphology (RLLM)

MiguelDev1 Feb 2025 19:17 UTC

4 points

2 comments2 min readLW link

An examination of GPT-2′s boring yet effective glitch

MiguelDev18 Apr 2024 5:26 UTC

5 points

3 comments3 min readLW link

Intergenerational Knowledge Transfer (IKT)

MiguelDev28 Mar 2024 8:14 UTC

6 points

0 comments1 min readLW link

RLLMv10 experiment

MiguelDev18 Mar 2024 8:32 UTC

5 points

0 comments2 min readLW link

A T-o-M test: ‘popcorn’ or ‘chocolate’

MiguelDev8 Mar 2024 4:24 UTC

20 points

13 comments1 min readLW link

Sparks of AGI prompts on GPT2XL and its variant, RLLMv3

MiguelDev7 Mar 2024 6:33 UTC

4 points

0 comments4 min readLW link

Can RLLMv3′s ability to defend against jailbreaks be attributed to datasets containing stories about Jung’s shadow integration theory?

MiguelDev29 Feb 2024 5:13 UTC

7 points

2 comments11 min readLW link

Research Log, RLLMv3 (GPT2-XL, Phi-1.5 and Falcon-RW-1B)

MiguelDev15 Feb 2024 3:39 UTC

4 points

0 comments262 min readLW link

GPT2XL_RLLMv3 vs. BetterDAN, AI Machiavelli & Oppo Jailbreaks

MiguelDev11 Feb 2024 11:03 UTC

16 points

4 comments14 min readLW link

Research Log, RLLMv2: Phi-1.5, GPT2XL and Falcon-RW-1B as paperclip maximizers

MiguelDev20 Jan 2024 15:30 UTC

6 points

0 comments10 min readLW link

[Question] rabbit (a new AI company) and Large Action Model (LAM)

MiguelDev10 Jan 2024 13:57 UTC

17 points

3 comments1 min readLW link

Reinforcement Learning using Layered Morphology (RLLM)

MiguelDev1 Dec 2023 5:18 UTC

7 points

0 comments29 min readLW link

Migueldev’s shortform

MiguelDev1 Nov 2023 8:54 UTC

2 points

14 comments1 min readLW link

GPT-2 XL’s capacity for coherence and ontology clustering

MiguelDev30 Oct 2023 9:24 UTC

6 points

2 comments41 min readLW link

Relevance of ‘Harmful Intelligence’ Data in Training Datasets (WebText vs. Pile)

MiguelDev12 Oct 2023 12:08 UTC

12 points

0 comments9 min readLW link

[Question] Who determines whether an alignment proposal is the definitive alignment solution?

MiguelDev3 Oct 2023 22:39 UTC

−1 points

6 comments1 min readLW link

<|endoftext|> is a vanishing text?

MiguelDev16 Sep 2023 2:34 UTC

10 points

0 comments1 min readLW link

On Ilya Sutskever’s “A Theory of Unsupervised Learning”

MiguelDev26 Aug 2023 5:34 UTC

10 points

0 comments19 min readLW link

Exploring the Responsible Path to AI Research in the Philippines

MiguelDev23 Aug 2023 8:44 UTC

6 points

0 comments6 min readLW link

A fictional AI law laced w/ alignment theory

MiguelDev17 Jul 2023 1:42 UTC

6 points

0 comments2 min readLW link