MiguelDev
A T-o-M test: ‘popcorn’ or ‘chocolate’
[Question] rabbit (a new AI company) and Large Action Model (LAM)
GPT2XL_RLLMv3 vs. BetterDAN, AI Machiavelli & Oppo Jailbreaks
Archetypal Transfer Learning: a Proposed Alignment Solution that solves the Inner & Outer Alignment Problem while adding Corrigible Traits to GPT-2-medium
Relevance of ‘Harmful Intelligence’ Data in Training Datasets (WebText vs. Pile)
I hope it’s not too late to introduce myself, and I apologize if it is the case. I’m Miguel, a former accountant and decided to focus on researching /upskilling to help solve the AI alignment problem.
Sorry if I got people confused here, of what I was trying to do in the past months posting about my explorations on machine learning.
Hello there,
Are you interested of funding this theory of mine that I submitted to AI alignment awards? I am able to make this work in GPT2 and now writing the results. I was able to make GPT2 shutdown itself (100% of the time) even if it’s aware of the shutdown instruction called “the Gauntlet” embedded through fine-tuning an artificially generated archetype called “the Guardian” essentially solving corrigibility, outer and inner alignment.
https://twitter.com/whitehatStoic/status/1646429585133776898?t=WymUs_YmEH8h_HC1yqc_jw&s=19
Let me know if you guys are interested. I want to test it in higher parameter models like Llama and Alpaca but don’t have the means to finance the equipment.
I also found out that there is a weird setting in the temperature for GPT2 where in the range of .498 to .50 my shutdown code works really well, I still don’t know why though. But yeah I believe that there is an incentive to review what’s happening inside the transformer architecture.
Here was my original proposal: https://www.whitehatstoic.com/p/research-proposal-leveraging-jungian
I’ll post my paper for the corrigibility solution too once finished probably next week but if you wish to contact me, just reply here or email me at migueldeguzmandev@gmail.com.
If you want to see my meeting schedule, You can find it here: https://calendly.com/migueldeguzmandev/60min
Looking forward to hearing from you.
Best regards,
Miguel
Update: Already sent an application, I didn’t saw that in my first read. Thank you.
Exploring Functional Decision Theory (FDT) and a modified version (ModFDT)
Hmmm. The way Sam behaves I can’t see a path of him leading an AI company towards safety. The way I interpreted his world tour (22 countries?) talking about OpenAI or AI in general, is him trying to occupy the mindspaces of those countries. A CEO I wish OpenAI has—is someone who stays at the offices, ensuring that we are on track of safely steering arguably the most revolutionary tech ever created—not trying to promote the company or the tech, I think it’s unnecessary to do a world tour if one is doing AI development and deployment safely.
(But I could be wrong too. Well, let’s all see what’s going to happen next.)
On Ilya Sutskever’s “A Theory of Unsupervised Learning”
<|endoftext|> is a vanishing text?
Hello, I agree with Jesse as the budget they have is really good for hiring capable alignment researchers here in Asia (I’m based currently in Chiang Mai, Thailand) or any other place where cost is extremely low compared back there in the West.
Good luck on this project team Dev Interp.
Anyone want to help out? I have some ideas I’d like to try at some point.
I can help, let me know what are those ideas you have mind...
I realized today that most of my posts on LessWrong were riddled with a ton of typographical errors that could have been avoided—no wonder why most of my work goes unread. As I go through the writing process, I feel pressured to publish the post because holding onto the thoughts in my head is very hard, painful in a sense. But, I must get better at managing this painful process.
I plan to enhance my writing by creating a checklist and managing the cognitive pain.
Trust the process. Manage the pain.
I did not press the disagreement button but here is where I disagree:
Yeah… On one hand, I am excited about Sam and Greg hopefully trying more interesting things than just scaling Transformer LLMs,
I expect Sam to open up a new AI company.
Sufficient-for-Safety Goal Loading is Substantially Difficult. As a strong default, absent alignment breakthroughs, we won’t be able to cause one of the first STEM-level AGI systems to have sufficient-for-safety goals. (E.g., we won’t be able to give it the subset of human morality required for it to do ambitious things without destroying the world).
Hello Rob,
I was able to transfer a shutdown protocol to GPT2-medium by allowing it to learn from aligned patterns present in an archetypal dataset consisting of 549 stories that explain the shutdown phrase, called “activate Oath”. Archetypal Transfer Learning (ATL) allowed for full value loading in a model like GPT-2-medium and possibly in larger models. Based on my initial experiments using the ATL method, the more capable the system is—the easier it is to implement.
Even in a traditional accounting sense, I’m not aware that there is any term that could capture the probable existential effects of a research, but I understand what @So8res is trying to pursue in this post which I agree with. But, I think apocalypse insurance is not the proper term here.
I think IAS/IFRS 19, actuarial gains or losses / IFRS 26 Retirement benefits are more closer to the idea—though these theortical accounting approaches applies to employees of a company. But these can be tweaked to another form of accounting theory (on another form of expense or asset) that captures how much costs are due out of possible catastrophic causes. External auditors can then review this periodically. (The proceeds from such should be pooled for averting the AGI existential risk scenarios—this might be a hard one to capture as to who manages the collected funds.)
To think of it, AI companies are misrepresenting their financials for not properly addressing a component in their reporting that reflects the “responsibility they have for the future of humanity”, and this post somehow did shed some light to me that yes, this value should be somehow captured in their financial statements.
Based on what I know, these AI companies have very peculiar company setups, yet the problem is the world’s population comprises the majority of the stakeholders (in a traditional accounting sense). So I think there is a case that AI companies should be obliged to present how they capture the possibility of losses from catastrophic events, and have them audited by external auditors—so the public is somehow aware: for example a publicly available FS will show these expenses and has been audited by a big 4 audit firm and then the average citizen will say: “Okay, this is how they are trying to manage the risks of AI research and it was audited by a Big 4 firm. I expect this estimated liability will be paid to the organisation built for redistributing such funds.”[1]
(AI companies can avoid declaring such future catastrophic expense, if they can guarantee that the AGI they are building won’t destroy the world which I am pretty sure no AI company can claim for the moment.)
I’m a former certified public accountant before going to safety research.
Not sure of who will manage the collections though, haven’t gone that far in my ideas. Yet, it is safe to say that talking to the IFRS board or GAAP board about this matter can be an option, and I expect that they will listen with the most respectable members of this community re: the peculiar financial reporting aspects of AI research.