Software engineer transitioned into AI safety, teaching and strategy. Particularly interested in psychology, game theory, system design, economics.
Jonathan Claybrough
“It’s true that we don’t want women to be driven off by a bunch of awkward men asking them out, but if we make everyone read a document that says ‘Don’t ask a woman out the first time you meet her’, then we’ll immediately give the impression that we have a problem with men awkwardly asking women out too much — which will put women off anyway.”
This seems a weak response to me, at best only defensible considering yourself to be on the margin and without thought for longterm growth and your ability to clarify intentions (you have more than 3 words when interacting with people irl).
To be clear explicitly writing “don’t ask women out the first time you meet her” would be terrible writing and if that’s the best writing members of that group can do for guidelines, then maybe nothing is better than that. Still, it reeks of “we’ve tried for 30 seconds and are all out of ideas” energy.
A guidelines document can give high level guidance to the vibe you want (eg. truth seeking, not too much aggressiveness, giving space when people feel uncomfortable, communicating around norms explicitly), all in the positive way (eg. you say what you want, not what you don’t want), and can refer to sub-documents to give examples and be quite concrete if you have socially impaired people around that need to learn this explicitly.
The case for training frontier AIs on Sumerian-only corpus
Note Existential is a term of art different from Extinction.
The Precipice cites Bostrome and defines it such:
”An existential catastrophe is the destruction of humanity’s longterm potential.
An existential risk is a risk that threatens the destruction of humanity’s longterm potential.”Disempowerment is generally considered an existential risk in the literature.
I participated in the previous edition of AISC and found it very valuable to my involvement in AI Safety. I acquired knowledge (on standards and the standards process), got experience, contacts. I appreciate how much coordination AISC enables, with groups forming, which enable many to have their first hands on experience and step up their involvement.
Thanks, and thank you for this post in the first place!
Would be up for this project. As is, I downvoted Trevor’s post for how rambly and repetitive it is. There’s a nugget of idea, that AI can be used for psychological/information warfare that I was interested in learning about, but the post doesn’t seem to have much substantive argument to it, so I’d be interested in someone both doing an incredibly shorter version which argued for its case with some sources.
It’s a nice pic and moment, I very much like this comic and the original scene. It might be exaggerating a trait (here by having the girl be particularly young) for comedic effect but the Hogfather seems right.
I think I was around 9 when I got my first sword, around 10 for a sharp knife. I have a scar in my left palm from stabbing myself with that sharp knife as a child while whittling wood for a bow. It hurt for a bit, and I learned to whittle away from me or do so more carefully. I’m pretty sure my life is better for it and (from having this nice story attached to it) I like the scar.
This story still presents the endless conundrum between avoiding hurt and letting people learn and gain skills.
Assuming the world was mostly the same as nowadays, by the time your children are parenting, would they have the skills to notice sharp corners if they never experienced them ?I think my intuitive approach here would be to put some not too soft padding (which is effectively close to what you did, it’s still an unpleasant experience hitting against that even with the cloth).
What’s missing is how to teach against existential risks. There’s an extent to which actually bleeding profusely from a sharp corner can help learn walking carefully, anticipating dangers, and that these skills do generalize to many situations and allows one to live a long fruitful life. (This last sentence does not pertain to the actual age of your children and doesn’t address ideal ages at which you can actually learn the correct and generalizable thing). If you have control on the future, remove all the sharp edges forever.
If you don’t, you remove the hard edges when they’re young, instore them again when they can/should learn to recognize what typically are hard edges and must be accounted for.
Are people losing ability to use and communicate in previous ontologies after getting Insight from meditation ? (Or maybe they never had the understanding I’m expecting of them ?) Should I be worried myself, in my practice of meditation ?
Today I reread Kensho by @Valentine, which presents Looking, and the ensuing conversation in the comments between @Said Achmiz and @dsatan, where Said asks for concrete benefits we can observe and mostly fails to get them. I also noticed interesting comments by @Ruby who in contrast was still be able to communicate in the more typical LW ontology, but hadn’t meditated to the point of Enlightenment. Is Enlightenment bad? Different ?
My impression is that people don’t become drastically better (at epistemology, rationality, social interaction, actually achieving your goals and values robustly) very fast through meditating or getting Enlightened, though they may acquire useful skills that could help to get better. If that’s the case, it’s safe for me to continue practicing meditation, getting into Jhanas, Insight etc (I’m following The Mind Illuminated), as the failings of Valentine/dsatan to communicate their points could just be attributed to them not being able to before either.
But I remain wary that people spend so much time engaging and believing in the models and practices taught in meditation material that they actually change their minds for the worse in certain respects. It looks like meditation ontologies/practices are Out to Get You and I don’t want to get Got.
Jonathan Claybrough’s Shortform
News : Biden-Harris Administration Secures Voluntary Commitments from Leading Artificial Intelligence Companies to Manage the Risks Posed by AI
An Overview of AI risks—the Flyer
I focused my answer on the morally charged side, not emotional. The quoted statement said A and B so as long as B is mostly true for vegans, A and B is mostly true for (a sub-group) of vegans.
I’d agree with the characterization “it’s deeply emotionally and morally charged for one side in a conversation, and often emotional to the other.” because most people don’t have small identities and do feel attacked by others behaving differently indeed.
It’s standard that the morally charged side in a veganism conversation is from people who argue for veganism.
Your response reads as snarky, since you pretend to have understood the contrary. You’re illustrating op’s point, that certain vegans are emotionally attached to their cause and jump to the occasion to defend their tribe. If you disagree to being pictured a certain way, at least act so it isn’t accurate to depict you that way.
Did you know about “by default, GPTs think in plain sight”?
It doesn’t explicitly talk about agentized GPTs but was discussing the impact this has on GPTs for AGI and how it affects the risks, and what we should do about it (eg. maybe rlhf is dangerous)
To not be misinterpreted, I didn’t say I’m sure it’s more the format than the content that’s causing the upvotes (open question), nor that this post doesn’t meet the absolute quality bar that normally warrants 100+ upvote (to each reader their opinion).
If you’re open to object level discussing this, I can point on concrete disagreement with the content. Most importantly, this should not be seen as a paradigm shift, because it does not invalidate any of the previous threat models—it would only be so if it rendered impossible to do AGI any other way. I also don’t think this should “change the alignment landscape” because it’s just another part of it, one which was known and has been worked on for years (Anthropic and OpenAI have been “aligning” LLMs and I’d bet 10:1 anticipated these would be used to do agents like most people I know in alignment).
To clarify, I do think it’s really important and great people work on this, and that in order this will be the first x-risk stuff we see. But we could solve the GPT-agent problem and still die to unalignment AGI 3 months afterwards. The fact that the world trajectory we’re on is throwing additional problems in the mix (keeping the world safe from short term misuse and unaligned GPT-agents) doesn’t make the existing ones simpler. There still is pressure to built autonomous AGI, there might still be mesa optimizers, there might still be deception, etc. We need the manpower to work on all of these, and not “shift the alignment landscape” to just focus on the short term risks.
I’d recommend to not worry much about PR risk, just ask the direct question: Even if this post is only ever read by LW folk, does the “break all encryption” add to the conversation? Causing people to take time to debunk certain suggestions isn’t productive even without the context of PR risk
Overall I’d like some feedback on my tone, if it’s too direct/aggressive to you of it’s fine. I can adapt.
You can read “reward is not the optimization target” for why a GPT system probably won’t be goal oriented to become the best at predicting tokens, and thus wouldn’t do the things you suggested (capturing humans). The way we train AI matters for what their behaviours look like, and text transformers trained on prediction loss seem to behave more like Simulators. This doesn’t make them not dangerous, as they could be prompted to simulate misaligned agents (by misuses or accident), or have inner misaligned mesa-optimisers.
I’ve linked some good resources for directly answering your question, but otherwise to read more broadly on AI safety I can point you towards the AGI Safety Fundamentals course which you can read online, or join a reading group. Generally you can head over to AI Safety Support, check out their “lots of links” page and join the AI Alignment Slack, which has a channel for question too.
Finally, how does complexity emerge from simplicity? Hard to answer the details for AI, and you probably need to delve into those details to have real picture, but there’s at least strong reason to think it’s possible : we exist. Life originated from “simple” processes (at least in the sense of being mechanistic, non agentic), chemical reactions etc. It evolved to cells, multi cells, grew etc. Look into the history of life and evolution and you’ll have one answer to how simplicity (optimize for reproductive fitness) led to self improvement and self awareness
Quick meta comment to express I’m uncertain that posting things in lists of 10 is a good direction. The advantages might be real, easy to post, quick feedback, easy interaction, etc.
But the main disadvantage is that this comparatively drowns out other better posts (with more thought and value in them). I’m unsure if the content of the post was also importantly missing from the conversation (to many readers) and that’s why this got upvoted so fast or if it’s a lot the format… Even if this post isn’t bad (and I’d argue it is for the suggestions it promotes), this is early warning of a possible trend that people with less thought out takes quickly post highly accessible content, get comparatively more upvotes than should, and it’s harder to find good content.
(Additional disclosure, some of my bad taste for this post come from the fact its call to break all encryption is being cited on Twitter as representative of the alignment community—I’d have liked to answer that obviously no, but it got many upvotes! This makes my meta point also seem to be motivated by PR/optics which is why it felt necessary to disclose but let’s mostly focus on consequences inside the community)
Hi, I’m currently evaluating the cost effectiveness of various projects and would be interested in knowing, if you’re willing to disclose, approximately how much this program costs MATS in total? By this I mean the summer cohort, includings ops before and after necessary for it to happen, but not counting the extension.