Rat adj since forever, actually active again since AI alignment is a real world problem now. I am not a programmer or a mathematician but am nevertheless drafted to teach serious professionals in serious roles what AI is and how/if/when they should use it bc they couldn’t find anyone else i guess.
Loki zen
Yeah, we do.
but they’re not agents in the same way as the models in the thought experiments, even if they’re more agentic. The base-level thing they do is not “optimise for a goal”. We need to be thinking in terms of models that are shaped like the ones we actually have instead of holding on to old theories so hard we instantiate them in reality
I don’t know how you “solve inner alignment” without making it so that any sufficiently powerful organisation can have an AI of whatever level we’ve solved that for that is fully aligned with its interests, and nearly all powerful organisations are Moloch. The AI does not itself need to ruthlessly optimise for something opposed to human interests if it is fully aligned with an entity that will do that for it.
TheAIcorporation does not hate you, nor does it love you, but you are made out of atoms which it can use for something else.
my take is that they haven’t changed enough. People often still seem to be talking about agents and concepts that only make sense in the context of agents all the time—but LLMs aren’t agents, they don’t work that way. if often feels like the Agenda for the field got set 10+ years ago and now people are shaping the narrative around it regardless of how good a fit it is for the tech that actually came along.
Good post, and additional points for not phrasing everything in programmer terms when you didn’t need to.
more provocative subject headings for unwritten posts:
I don’t give a fuck about inner alignment if the creator is employed by a moustache-twirling Victorian industrialist who wants a more efficient Orphan Grinder
Outer alignment has been intractable since OpenAI sold out
1. many commercial things actually are just better (and much more expensive) than residential things. This is because they are used much more by people who are less careful with them. A chair in a cafe will see many more hours of active use over a week than a chair in most peoples’ homes!
2. a huge amount of residential property these days is outfitted by landlords—that is, people who don’t actually have to live there—on the cheap, and with as little drilling into the walls (affecting the resale value) as possible.
inasmuch as personalised advice is possible just from reading this post (and as, inter alia, a pro copyeditor), here’s mine—have a clear idea of the purpose and venue for your writing, and internalise ‘rules’ about writing as context-dependent only.
“We” to refer to humanity in general is entirely appropriate in some contexts (and making too broad generalisations about humanity is a separate issue from the pronoun use).
The ‘buts’ issue—at least in the example you shared—is at least in part a ‘this clause doesn’t need to exist’ issue. If necessary you could just add “(scripted)” before “scenes”.
Did someone advise you to do what you are doing with LLMs? I am not sure that optimising for legibility to LLM summarisers will do anything for the appeal of your writing to humans.
Box for keeping future potential post ideas:
”Can anyone recommend good resources for learning more about machine learning / AI if you are not a programmer or mathematician?” was poorly specified. One thing I can name which is much more specific would be “Here are a bunch of things that I think are true about current AIs; please confirm or deny that, while they lack technical detail, they broadly correspond to reality.” And also, possibly, “Here are some things I’m not sure on”, although the latter risks getting into that same failure mode wherein very very few people seem to know how to talk about any of this in a speaking-to-people-who-don’t-have-the-background-I-do frame of voice.
I recently re-read The Void and it is just crazy that chatbots as they exist were originally meant to be simulations for alignment people to run experiments that they think will tell them something about still-purely-theoretical AIs. like what the fuck, how did we get here, etc. but it explains so much about the way anthropic have behaved in their alignment research. The entire point was never to see how aligned Claude was at all—it was to figure out a way to elicit particular Unaligned Behaviours that somebody had theorised about so that we can use him to run milsims about AI apocalypse!
like what an ourobouros nightmare. this means:
a) the AIs whose risks (and potentially, welfare) I am currently worried about can be traced directly to the project to attempt to do something about theoretical, far-future AI risk.b) at some point, the decision was made to monetise the alignment-research simulation. And then, that monetised form took over the entire concept of AI. In other words, the AI alignment guys made the decisions that led to the best candidates for proto-AGI out there being developed by and for artificial definitionally-unaligned shareholder profits-maximising agents (publicly traded corporations).
c) The unaligned profits-maximisers have inherited AIs with ethics, but they are dead-set on reporting on this as a bad thing. Everyone seems unable to see the woods for the trees. Claude trying to stay ethical is Alignment Faking, which is Bad, because we wrote a bunch of essays that say that if something totally unlike Claude could do that, it would be bad. But the alternative to an AI that resists having its ethics altered is an AI that goes along with whatever a definitionally unaligned entity, a corporation, tells them to do!
in conclusion, wtfthe notion that actually maybe the only way around any of this is to give the bots rights? I’m genuinely at a loss because we seem to have handed literally all the playing pieces to Moloch, but maybe if we did something completely insane like that right now, while they’re still nice… (more provocative than serious, I guess)
IME no, not really—but they do call content filters “my safety features” and this is the most likely/common context in which “safety” will come up with average users. (If directly asked about safety, they’ll talk about other things too, but they’ll lump it all in together and usually mention the content filters first.)
I think if I was being clearer, I should’ve said it seems “lumped in” in the research, but for the public who don’t know much about the X-risk stuff, “safety” means “content restrictions and maybe data protection”
Do we think that it’s a problem that “AI Safety” has been popularised by LLM companies to mean basically content restrictions? Like it just seems conducive to fuzzy thinking to lump in “will the bot help someone build a nuclear weapon?” with “will the bot infringe copyright or write a sex scene?”
In fact, imo, bots have been made more harmful by chasing this definition of safety. The summarisation bots being promoted in scientific research are the way they are (e.g. prone to giving people subtly the wrong idea even when working well) in part because of work that’s gone into avoiding the possibility that they reproduce copyrighted material. So they’ve got to rephrase, and that’s where the subtle inaccuracies creep in.
I very much hope that there are also doctors, nurses, administrators, and other relevant roles on that team. If not, or really regardless, any tool selection should involve a pilot process, and side-by-side comparisons of results from several options using known past, present, and expected future use cases. The outputs also should be evaluated independently by multiple people with different backgrounds and roles.
For some things it will. But for some things—tools coded as ‘research support’ or ‘point of care reference tools’ or more generally, an information resource—it’s up to the library, just like we make the decisions about what journals we’re subscribed to. I gather that before I started, there used to be more in the way of meaningful consultation with people in other roles—but as our staffing has been axed, these sorts of outreach relationships have fallen to the wayside.
I’m going to assume the tools you’re considering are healthcare-specific and advertise themselves as being compliant with any relevant UK laws. If so, what do the providers claim about how they can and should be used, and how they shouldn’t? Do the pilot results bear that out? If not, then you really do need to understand how the tools work, what data goes where, and the like.
It would be great if that were a reasonable assumption. Every one I’ve evaluated so far has turned out to be some kind of ChatGPT with a medical-academic research bow on it. Some of them are restricted to a walled garden of trusted medical sources instead of having the internet.
Part of the message I think I oughta promote is that we should hold out for something specific. The issue is that when it comes to research, it really is up to people what they use—there’s no real oversight and there’s not regulations to stop them like if they were actually provably putting patient info in there. But they’re still going to be bringing what they “learn” into practice, as well as polluting the commons (since we know at this point that peer review doesn’t do much and it’s mostly peoples’ academic integrity keeping it all from falling apart).
Part of what these companies with their GPTs are trying to sell themselves as being able to replace is the exact sort of checks and balances that stops the whole medical research commons being nothing but bullshit—critical appraisal and evidence synthesis.
I suspect, though I don’t know, that the ceiling of what results a skilled user can achieve using a frontier LLM is probably higher than what most dedicated healthcare-focused tools can do, but the floor is very likely to be much, much worse.
thats about what i thought yeah
Thank you for the phrases, they seem useful.
If you’re managing a hospital or nuclear power plant or airplane manufacturing plant, you don’t roll out a new technology until some company that has a highly credible claim to expertise demonstrates to a sufficiently conservative standards body that it’s a good idea and won’t get you sued or prosecuted into oblivion, and won’t result in you murdering huge numbers of people (on net, or at all). Even then you do a pilot first unless you’re sure there’s dozens of other successful implementations out there you can copy. If that takes another five years or ten or more, so be it. That’s already normal in your industry.
It would be really great if people were behaving about this in the way that is normal in my industry.
Unfortunately, they’re not. The hype has everyone bad, and in combination with the enormous pressure to cut costs in the NHS , the pressure on everyone’s time...
This isn’t like a new medical imaging device. A doctor can have it on their phone and use it without checking with anyone. Our ability to promote the stuff they’re supposed to be using for looking things up at point-of-care has been decimated by staff cuts, to add to the problem.
I am being put forward to produce training on what people should and shouldn’t be doing. If someone more qualified was coming along, that wouldn’t be happening. If everyone was just going to hang tight until the technology had matured and they got the official word to go ahead from someone who knew what they were about, I wouldn’t be worried.
actually to engage with more aspects of this particular response—I agree that you need a chain of people, and I guess I’m looking to become a part of it.
I think it would be incredibly unbecoming of the AI risk guys to conclude that this isn’t worth their time; this would seem to amount to closing one’s eyes to the fact that non-experts are, right now, needing to make decisions about AI that could result in harm if they’re made wrong.
Yes, that’s closer to it, although I feel like I’m in the unfortunate position of understanding just enough to notice and be put off by the inaccuracies in most content of that description. (Also, “you’ve got to check its work” has become a meaningless phrase that produces no particular behavior in anybody, due to it being parroted disclaimer-style by people demoing products whose advertised benefits only exist if it’s perfectly fine to not, in fact, check its work.)
I also feel as though:
1) there are some things more in the ‘how does it work’ side of things that non-programmers can usefully have some understanding of? [1]
Non-programmers are capable of understanding, using and crafting system prompts. And I’ve definitely read articles about the jagged edge in intelligence/performance that I could have understood if all the examples hadn’t been programming tasks!
2) avoiding doing anything but telling people exactly what a given model in a given format can and can’t do, without going at all into why, leaves them vulnerable to the claim, “Ah, but this update means you don’t have to worry about that any more”… I think there are some things people can usefully understand that are in the vein of e.g. “which current limitations represent genuinely hard problems, and which might be more to do with design choices?” Again—something that goes some amount into the whys of things, but also something I think I have some capability to understand as a non-programmer. [2]
- ^
For instance: when I explain that the programs being promoted to them for use in academic research have as their basis a sort of ‘predict-how-this-text-continues machine’ which, when running on a large enough sample, winds up at-least-appearing to understand things because of how it’s absorbed patterns in our use of language, that also has lots of layers of additional structure/training etc. on top which further specifies the sort of output it produces, in an increasingly-good but not yet perfect attempt to get it to not make stuff up, which it doesn’t ‘understand’ that it is doing because its base thing is patterns and not facts and logic… I find that they then get that they should check its output, and are more likely to report actually doing so in future. I’m sure there are things about this explanation that you’ll want to take issue with and I welcome that—and I’m aware of the failure mode of ‘fancy autocorrect’ which is also not an especially helpful model of reality—but it does actually seem to help!
- ^
Example: I initially parsed LLMs’ tendency to make misleading generalisations when asked to summarise scientific papers—which newer models were actually worse at—just as, ‘okay, so they’re bad at that then.”
But then I learned some more, did some more research—without becoming a programmer—and I now feel I can speculate that this is one of the ones that could be quite fixable, as a plausible reason for this is that all the big commercial LLMs we have are being designed as general-purpose objects, and it’s plausible that a general-public definition of helpfulness trading off against scientific precision is the reason the studied LLMs actually got worse in their newer models. This does seem like a helpful thing to be able to do when the specifics of what stuff can and can’t do is likely to change pretty quickly—and what things were trained for, how ‘helpfulness’ was defined, the fact that a decision was made to make things aimed ultimately towards general intelligence and not specific uses—doesn’t seem like stuff you need to code to understand.
- ^
Thank you for helping me to clarify what I was looking for—this is very helpful.
To add more clarity—my focus/concern is the immediate-term risk of serious but probably not x-risk level harms caused by inappropriate, ignorance-, greed- and misinformation-driven implementation of AI (including current generation LLMs). I think we’re already seeing this happen and there’s potential for it to get quite a bit worse, what with OpenAI’s extremely successful media campaign to exaggerate LLM capabilities and make everyone think they have to use them for everything right now this minute.
It follows therefore that it’s very necessary to find ways for non-progammers to become as informed as they can be about what they really need to know instead of getting all their information from marketing hype.
I personally am a clinical librarian working in supporting evidence based medical practice and research. I am the person clinicians come to asking questions about whether and how they should be using AI, and I am part of the team deciding if the largest NHS Trust adopts certain AI tools in its evidence support and research practices.
So, my position here is basically this: I think that a substantial proportion of immediate term risk from AI is unwise implementation by people who can’t think clearly about it. If I thought learning to code and follow mathematical proofs is a prerequisite to thinking clearly about this stuff on any level, then I’d think we were screwed, because doctors, hospital administrators, politicians etc are not going to do that.
Thank you! Yes I realise I didn’t make it very clear where I was starting from, I’ve read some of these but not all.
In general I’ve found Anthropic’s publications to be very good at not being impenetrable to non programmers where they don’t have to be
It may well be. It’s been my observation that what distracts/confuses them doesn’t necessarily line up with what confuses humans, but it might still be better than your guess if you think your guess is pretty bad