MiguelDev

Karma: 298

→ help avoid catastrophic AI failures...

MiguelDev 10 Nov 2022 14:39 UTC
3 points
0
in reply to: Ruby’s comment on: The Matrix of Untruth
Thank you Ruby. I had posted it a month ago in my blog and thinking how will this idea that I am experiencing will be received in this forum. No Worries, thanks for the time reviewing this.

MiguelDev 3 Mar 2023 23:17 UTC
0 points
−1
in reply to: MadHatter’s comment on: Research proposal: Leveraging Jungian archetypes to create values-based models
The proposal is trying to point out a key difference in the way alignment reasearch and Carl Jung understood pattern recognition in humans.
I stated as one of the limitations of the paper that:
“The author focused on the quality of argument rather than quantity of citations, providing examples or testing. Once approved for research, this proposal will be further tested and be updated.”
I am recommending here a research area that I honestly believe that can have a massive impact in aligning humans and AI.

MiguelDev 8 Mar 2023 2:27 UTC
1 point
0
in reply to: mwatkins’s comment on: SolidGoldMagikarp (plus, prompt generation)
I think it’s different from the shadow archetype… It might be more related to the trickster..

MiguelDev 8 Mar 2023 14:24 UTC
1 point
0
in reply to: mwatkins’s comment on: SolidGoldMagikarp (plus, prompt generation)
Hmmmm. Well us humans have all archetypes in us but at different levels at different points of time or use. I wonder what triggered such representations? well it’s learning from the data but yeah what are the conditions at the time of the learning was in effect—like humans react to archetypes when like socializing with other people or solving problems...hmmmmm. super interesting. Yeah to quote Neitzsche is fascinating too, I mean why? is it because many great rappers look up to him or many rappers look up to certain philosophers that got influenced by Neitzsche? super intriguing..
I will be definitely looking forward to that report on petertodd phenomenon, I think we have touched something that Neuroscientists / psychologists have been longing find...

MiguelDev 9 Mar 2023 2:37 UTC
3 points
0
on: Simulators
The strict version of the simulation objective is optimized by the actual “time evolution” rule that created the training samples. For most datasets, we don’t know what the “true” generative rule is, except in synthetic datasets, where we specify the rule.
I hope I read this before while doing my research proposal. But pretty much have arrived to the same conclusion that I believe alignment research is missing out—the pattern recognition learning systems being researched/deployed currently seems to lack a firm grounding on other fields of sciences like biology or pyschology that at the very least links to chemistry and physics.

MiguelDev 9 Mar 2023 2:49 UTC
1 point
0
on: Simulators
- What if the input “conditions” in training samples omit information which contributed to determining the associated continuations in the original generative process? This is true for GPT, where the text “initial condition” of most training samples severely underdetermines the real-world process which led to the choice of next token.
- What if the training data is a biased/limited sample, representing only a subset of all possible conditions? There may be many “laws of physics” which equally predict the training distribution but diverge in their predictions out-of-distribution.
I honestly think these are not physics related questions though they are very important to ask. These can be better associated to the bias of the researchers that chosed the input conditons and the relevance of training data.

MiguelDev 9 Mar 2023 2:59 UTC
7 points
6
on: Simulators
Guessing the right theory of physics is equivalent to minimizing predictive loss. Any uncertainty that cannot be reduced by more observation or more thinking is irreducible stochasticity in the laws of physics themselves – or, equivalently, noise from the influence of hidden variables that are fundamentally unknowable.
This is the main sentence in this post. The simulator as a concept might even change if the right physics were discovered. I would be looking forward to your expansion of the topic in the succeeding posts @janus.

MiguelDev 9 Mar 2023 18:13 UTC
1 point
on: Just Imitate Humans?
Could enough human-imitating artificial agents (running much faster than people) prevent unfriendly AGI from being made?
I think the problem of scale doesn’t necessarily gets solved through quantity—because there are just qualitative issues (eg. loss of human life) that no amount of infrastructure upscale can compensate.

MiguelDev 17 Mar 2023 23:45 UTC
1 point
0
on: Empathy as a natural consequence of learnt reward models
Outside of apes and monkeys, dophins and elephants, as well as corvids also appear in anecdotal reports and the scientific literature to have many complex forms of empathy.
Might be related to Erich Neumann’s book The Great mother which cites: “The psychological development [of humankind]… begins with the ‘matriarchal’ stage in which the archetype of the Great Mother dominates and the unconscious directs the psychic process of the individual and the group.” It’s like when we see animals in the wild eg. the lioness and its cub, we always associate it as the mother and its child—we do not have to google or open a book to like ensure that it is the case but deep within our psyche is that pattern that allows us to interpret it as such.

MiguelDev 20 Mar 2023 10:51 UTC
0 points
0
in reply to: the gears to ascension’s comment on: Humanity’s Lack of Unity Will Lead to AGI Catastrophe
Thank you.

MiguelDev 23 Mar 2023 17:52 UTC
1 point
0
in reply to: LVSN’s comment on: Human Intentions or Humanity?
I’m sorry, I have no way to answer your question.. I just hope in the future we do.

MiguelDev 23 Mar 2023 23:23 UTC
1 point
0
on: Human Intentions or Humanity?
Hello Moderators/Readers,
I am curious as to why the post was downvoted. I would appreciate an explanation so I can improve my writing moving forward. I aim to arrive at helping in solving the alignment problem. Thank you.

MiguelDev 24 Mar 2023 0:52 UTC
1 point
0
in reply to: TAG’s comment on: Human Intentions or Humanity?
Thank you for your response.
I understand your 2nd point, but to comment on your 1st comment—is having the simpler question always the right thing to focus on? isn’t searching for the right questions to ponder the best way to arrive at the best solutions?

MiguelDev 24 Mar 2023 1:23 UTC
1 point
0
in reply to: TAG’s comment on: Human Intentions or Humanity?
Subpar questions lead to incomplete /wrong answers. If it happens to be that we wrongly framed the alignment problem, the cost of this is huge or even catastrophic. It’s still cheaper to question even the best ideas now rather than change directions or correct errors later.

MiguelDev 24 Mar 2023 1:38 UTC
1 point
0
in reply to: TAG’s comment on: Human Intentions or Humanity?
I’m in the process of writing it. Will link it here once finished. Thanks for being more direct too.

MiguelDev 13 Apr 2023 2:55 UTC
12 points
0
on: Apply to >30 AI safety funders in one application with the Nonlinear Network
Hello there,

Are you interested of funding this theory of mine that I submitted to AI alignment awards? I am able to make this work in GPT2 and now writing the results. I was able to make GPT2 shutdown itself (100% of the time) even if it’s aware of the shutdown instruction called “the Gauntlet” embedded through fine-tuning an artificially generated archetype called “the Guardian” essentially solving corrigibility, outer and inner alignment.

https://twitter.com/whitehatStoic/status/1646429585133776898?t=WymUs_YmEH8h_HC1yqc_jw&s=19

Let me know if you guys are interested. I want to test it in higher parameter models like Llama and Alpaca but don’t have the means to finance the equipment.

I also found out that there is a weird setting in the temperature for GPT2 where in the range of .498 to .50 my shutdown code works really well, I still don’t know why though. But yeah I believe that there is an incentive to review what’s happening inside the transformer architecture.

Here was my original proposal: https://www.whitehatstoic.com/p/research-proposal-leveraging-jungian

I’ll post my paper for the corrigibility solution too once finished probably next week but if you wish to contact me, just reply here or email me at migueldeguzmandev@gmail.com.

If you want to see my meeting schedule, You can find it here: https://calendly.com/migueldeguzmandev/60min

Looking forward to hearing from you.

Best regards,

Miguel

Update: Already sent an application, I didn’t saw that in my first read. Thank you.

MiguelDev 18 Apr 2023 2:23 UTC
1 point
0
on: The Guardian Persona: Incorporating Artificially Generated Archetypes to Solve the Outer, Inner Alignment Problem and Corrigibility
No time to rest. I’m starting to build The Guardian version 002.

MiguelDev 26 Apr 2023 14:20 UTC
1 point
0
in reply to: Aaron_Scher’s comment on: The Guardian Persona: Incorporating Artificially Generated Archetypes to Solve the Outer, Inner Alignment Problem and Corrigibility
Hello Aaron,

Sorry it took me time to reply but you might find it worthy to read my updated account of this approach linked below:

https://www.lesswrong.com/posts/pu6D2EdJiz2mmhxfB/gpt-2-shuts-down-itself-386-times-post-fine-tuning-with

I will answer your questions—if any in that post. Thank you.

Thank you.

MiguelDev 27 Apr 2023 8:23 UTC
1 point
0
in reply to: mesaoptimizer’s comment on: Archetypal Transfer Learning: a Proposed Alignment Solution that solves the Inner x Outer Alignment Problem while adding Corrigible Traits to GPT-2-medium

Why ask an AI to shut down if it recognizes its superiority? If it cannot become powerful enough for humans to handle, it cannot become powerful enough to protect humans from another AI that is too powerful for humans to handle.

As discussed in the post, I aimed for a solution that can embed a shutdown protocol that is modeled in a real world scenario. Of course It could have been just a pause for repair or debug mode but yeah, I focused on a single idea.. Can we embed a shutdown instruction reliably. Which I was able to demonstrate.

How successful is this strategy given increasing scale of LLMs and its capabilities? If this was performed on multiple scales of GPT-2 , it would provide useful empirical data about robustness to scale. My current prediction is that this is not robust to scale given that you are fine-tuning on stories to create personas. The smarter the model is, the more likely it is to realize when it is being tested to provide the appropriate “shutdown!” output and pretend to be the AP, and in out-of-distribution scenarios, it will pretend to be some other persona instead.

As mentioned in the “what’s next” section, I will look into these part once I have the means to upgrade my old mac. But I believe it will be easier to do this because of the larger number of parameters and layers. Again, this is just a demonstration of how to solve the inner, outer alignment problem and corrigibility in a single method. As things go complex in this method utilizing a learning rates, batching, epochs and number of quality archetypal data will matter. This method can scale as the need arises. But that requires a team effort which I’m looking to address at for the moment.

The AP finetuned model seems vulnerable to LM gaslighting the same way ChatGPT is. This does not seem to be an improvement over OAI’s Instruct fine-tuning or whatever they did to GPT-4.

sorry I’m not familiar with LM gaslighting.

Based on what I can tell, AP fine-tuning will lead to the AI more likely simulating the relevant AP and its tokens will be what the simulator thinks the AP would return next. This means it is brittle to systems that leverage this model since they can simply beam search and ignore the shutdown beams. RLHF-like fine-tuning strategies probably perform better, according to my intuition.

has RLHF solved the problems I tried to solve in this project? Again this project is to demonstrate q new concept not an all in a bucket solution at the moment. But given that it is scalable, avoid all researcher /human, team, CEO or even investor biases… This is a strong candidate for an alignment solution.

Also, to correct—I call this the archetypal transfer learning method (ATL) for the fine tuning version. My original proposal to the alignment awards was to not use unstructured data because alignment issues arises from that. If I were to build an aligned AI system, I will not use random texts that doesn’t model our thinking. We think in archetypal patterns and 4chan, social media and reddit platforms are not the best sources. I’d rather books, scientific papers or scripts from podcasts… Like long form quality discussions are better sources of human thinking..

MiguelDev 27 Apr 2023 14:05 UTC
0 points
0
in reply to: Mitchell_Porter’s comment on: Archetypal Transfer Learning: a Proposed Alignment Solution that solves the Inner x Outer Alignment Problem while adding Corrigible Traits to GPT-2-medium
I believe so. When it worked I was emotional a bit. There is hope.