I’ll definitely check that out! If you’re able to point me to any of the other material you’re talking about, I’d very much appreciate it—what I was saying was very much a surface-level take based on what I clicked to from the immediate front page.
To answer your questions: -approachable by non-programmers who find themselves in the position of needing to make decisions about how and whether to use AI in their lives and work environments (which could be things that it’s very important run smoothly, like a hospital) -with the goal of enabling them to make sensible decisions -written by: I don’t know. Someone who doesn’t throw their hands in the air at the very prospect of teching anything substantive to a non-programmer, and who is not employed in a sales or marketing capacity by a company trying to sell an AI-branded product.
So, so this is mostly about mundane utility, not about x-risk and the like, and not about generally improving the quality of one’s own thinking. That’s very helpful.
For improving one’s own thinking, there’s no short path, and trying to make one without a guide who gives reliable, individual feedback can be worse than doing nothing.
For mundane utility, that’s a bit harder, especially for something general, because the specifics keep changing with constant new releases, and they differ a lot between fields. Earlier this year I was leading a team (of essentially the 3 main early adopters) doing exactly this for my own employer, and it amounted to us trialing a bunch of things, deciding it made the most sense to pick some low-hanging fruit while waiting for more ambitious adoption to get easier as tech improved, and assembling a bunch of our own training modules and best practices. But ours is a much lower pressure and less regulated environment than your hospital example. Even then, twice I went into one of those trainings and had to say, “This morning [CEO] tweeted that [company] just released [update]. If that works it seems like it really changes this step, but for the better. Expect that to happen a lot.”
Without knowing specifically what you do, what styles you like to read, or anything else, I’ll just say I’ve learned a lot and found it useful to read Zvi Mowshowitz’s weekly AI updates, either here or at https://thezvi.substack.com/ It’s more a weekly news roundup than anything else, and we’re like 124 weeks in, but you can skip around to bits you care about.
Thank you for helping me to clarify what I was looking for—this is very helpful.
To add more clarity—my focus/concern is the immediate-term risk of serious but probably not x-risk level harms caused by inappropriate, ignorance-, greed- and misinformation-driven implementation of AI (including current generation LLMs). I think we’re already seeing this happen and there’s potential for it to get quite a bit worse, what with OpenAI’s extremely successful media campaign to exaggerate LLM capabilities and make everyone think they have to use them for everything right now this minute.
It follows therefore that it’s very necessary to find ways for non-progammers to become as informed as they can be about what they really need to know instead of getting all their information from marketing hype.
I personally am a clinical librarian working in supporting evidence based medical practice and research. I am the person clinicians come to asking questions about whether and how they should be using AI, and I am part of the team deciding if the largest NHS Trust adopts certain AI tools in its evidence support and research practices.
I am part of the team deciding if the largest NHS Trust adopts certain AI tools
I very much hope that there are also doctors, nurses, administrators, and other relevant roles on that team. If not, or really regardless, any tool selection should involve a pilot process, and side-by-side comparisons of results from several options using known past, present, and expected future use cases. The outputs also should be evaluated independently by multiple people with different backgrounds and roles.
I’m going to assume the tools you’re considering are healthcare-specific and advertise themselves as being compliant with any relevant UK laws. If so, what do the providers claim about how they can and should be used, and how they shouldn’t? Do the pilot results bear that out? If not, then you really do need to understand how the tools work, what data goes where, and the like.
I do think the value and potential are real here. As a non-professional private citizen, I have used and do use LLM chat interfaces to gather information and advice for my own, personal health, and that of my pets (or other loved ones when requested). It has been extremely useful at helping me learn things that many more hours of manual searching previously failed to find, and acting as an idea-generator and planning collaborator. But, I have put in a lot of effort to develop a system prompt and prompting strategies that reduce sycophancy, lying, hallucination, and fabrication, and promote clear, logical reasoning and explicit probability estimates of inferences and conclusions. I compare outputs of different LLMs to one another. I ask for and check sources. And I’m not doing anything in response that has any significant probability of being dangerous, or without consulting real people I trust to tell me if I’m being an idiot. But on the other hand, I’ve seen the results other people get when they don’t put in that much effort to using the LLMs well, and the results are either very generic, or in various ways just plain bad.
I suspect, though I don’t know, that the ceiling of what results a skilled user can achieve using a frontier LLM is probably higher than what most dedicated healthcare-focused tools can do, but the floor is very likely to be much, much worse.
This may or may not be useful, but in terms of training, I have a few phrases and framings I keep repeating to people that seem to resonate. I don’t remember where I first heard them.
“Think of AI as an infinite army of interns. Five years ago they were in elementary school. Now they’ve graduated college. But they have no practical experience and only have the context you explicitly give them. Reasoning models and Deep Research/Extended Thinking are like giving the interns more time to think with no feedback from you.”
“Don’t assume what you were doing and seeing with AI 3+ months ago has any bearing on what AI can or will do today. Models and capabilities change, even with the same designation.”
“You’re not talking to a person. Your prompt is telling the AI who to pretend to be as well as what kind of conversation it is having.” (In the personal contexts mentioned above, I would include things at the start of a prompt like “You are a metabolic specialist with 20 years of clinical and research experience” or “You are a feline ethologist.”)
Related: “If your prompt reads like a drunken text, the response will be the kind of response likely to be the reply to a drunken text.”
Also related: “Your system prompt tells the AI who you are, and how you want it to behave. This can be as detailed as you need, and should probably be paragraphs or pages long.” (Give examples)
“The big AI companies are all terrible at making their models into useful products. Never assume they’ve put reasonable effort into making features work the way you’d expect, or that they’ve fixed bugs that would require a few minutes of coding effort to fix.”
I very much hope that there are also doctors, nurses, administrators, and other relevant roles on that team. If not, or really regardless, any tool selection should involve a pilot process, and side-by-side comparisons of results from several options using known past, present, and expected future use cases. The outputs also should be evaluated independently by multiple people with different backgrounds and roles.
For some things it will. But for some things—tools coded as ‘research support’ or ‘point of care reference tools’ or more generally, an information resource—it’s up to the library, just like we make the decisions about what journals we’re subscribed to. I gather that before I started, there used to be more in the way of meaningful consultation with people in other roles—but as our staffing has been axed, these sorts of outreach relationships have fallen to the wayside.
I’m going to assume the tools you’re considering are healthcare-specific and advertise themselves as being compliant with any relevant UK laws. If so, what do the providers claim about how they can and should be used, and how they shouldn’t? Do the pilot results bear that out? If not, then you really do need to understand how the tools work, what data goes where, and the like.
It would be great if that were a reasonable assumption. Every one I’ve evaluated so far has turned out to be some kind of ChatGPT with a medical-academic research bow on it. Some of them are restricted to a walled garden of trusted medical sources instead of having the internet.
Part of the message I think I oughta promote is that we should hold out for something specific. The issue is that when it comes to research, it really is up to people what they use—there’s no real oversight and there’s not regulations to stop them like if they were actually provably putting patient info in there. But they’re still going to be bringing what they “learn” into practice, as well as polluting the commons (since we know at this point that peer review doesn’t do much and it’s mostly peoples’ academic integrity keeping it all from falling apart).
Part of what these companies with their GPTs are trying to sell themselves as being able to replace is the exact sort of checks and balances that stops the whole medical research commons being nothing but bullshit—critical appraisal and evidence synthesis.
I suspect, though I don’t know, that the ceiling of what results a skilled user can achieve using a frontier LLM is probably higher than what most dedicated healthcare-focused tools can do, but the floor is very likely to be much, much worse.
I’ll definitely check that out! If you’re able to point me to any of the other material you’re talking about, I’d very much appreciate it—what I was saying was very much a surface-level take based on what I clicked to from the immediate front page.
To answer your questions:
-approachable by non-programmers who find themselves in the position of needing to make decisions about how and whether to use AI in their lives and work environments (which could be things that it’s very important run smoothly, like a hospital)
-with the goal of enabling them to make sensible decisions
-written by: I don’t know. Someone who doesn’t throw their hands in the air at the very prospect of teching anything substantive to a non-programmer, and who is not employed in a sales or marketing capacity by a company trying to sell an AI-branded product.
So, so this is mostly about mundane utility, not about x-risk and the like, and not about generally improving the quality of one’s own thinking. That’s very helpful.
For x-risk stuff, I would have said you might want to pre-order If Anyone Builds It, Everyone Dies: Why Superhuman AI Would Kill Us All.
For improving one’s own thinking, there’s no short path, and trying to make one without a guide who gives reliable, individual feedback can be worse than doing nothing.
For mundane utility, that’s a bit harder, especially for something general, because the specifics keep changing with constant new releases, and they differ a lot between fields. Earlier this year I was leading a team (of essentially the 3 main early adopters) doing exactly this for my own employer, and it amounted to us trialing a bunch of things, deciding it made the most sense to pick some low-hanging fruit while waiting for more ambitious adoption to get easier as tech improved, and assembling a bunch of our own training modules and best practices. But ours is a much lower pressure and less regulated environment than your hospital example. Even then, twice I went into one of those trainings and had to say, “This morning [CEO] tweeted that [company] just released [update]. If that works it seems like it really changes this step, but for the better. Expect that to happen a lot.”
Without knowing specifically what you do, what styles you like to read, or anything else, I’ll just say I’ve learned a lot and found it useful to read Zvi Mowshowitz’s weekly AI updates, either here or at https://thezvi.substack.com/ It’s more a weekly news roundup than anything else, and we’re like 124 weeks in, but you can skip around to bits you care about.
Thank you for helping me to clarify what I was looking for—this is very helpful.
To add more clarity—my focus/concern is the immediate-term risk of serious but probably not x-risk level harms caused by inappropriate, ignorance-, greed- and misinformation-driven implementation of AI (including current generation LLMs). I think we’re already seeing this happen and there’s potential for it to get quite a bit worse, what with OpenAI’s extremely successful media campaign to exaggerate LLM capabilities and make everyone think they have to use them for everything right now this minute.
It follows therefore that it’s very necessary to find ways for non-progammers to become as informed as they can be about what they really need to know instead of getting all their information from marketing hype.
I personally am a clinical librarian working in supporting evidence based medical practice and research. I am the person clinicians come to asking questions about whether and how they should be using AI, and I am part of the team deciding if the largest NHS Trust adopts certain AI tools in its evidence support and research practices.
I very much hope that there are also doctors, nurses, administrators, and other relevant roles on that team. If not, or really regardless, any tool selection should involve a pilot process, and side-by-side comparisons of results from several options using known past, present, and expected future use cases. The outputs also should be evaluated independently by multiple people with different backgrounds and roles.
I’m going to assume the tools you’re considering are healthcare-specific and advertise themselves as being compliant with any relevant UK laws. If so, what do the providers claim about how they can and should be used, and how they shouldn’t? Do the pilot results bear that out? If not, then you really do need to understand how the tools work, what data goes where, and the like.
I do think the value and potential are real here. As a non-professional private citizen, I have used and do use LLM chat interfaces to gather information and advice for my own, personal health, and that of my pets (or other loved ones when requested). It has been extremely useful at helping me learn things that many more hours of manual searching previously failed to find, and acting as an idea-generator and planning collaborator. But, I have put in a lot of effort to develop a system prompt and prompting strategies that reduce sycophancy, lying, hallucination, and fabrication, and promote clear, logical reasoning and explicit probability estimates of inferences and conclusions. I compare outputs of different LLMs to one another. I ask for and check sources. And I’m not doing anything in response that has any significant probability of being dangerous, or without consulting real people I trust to tell me if I’m being an idiot. But on the other hand, I’ve seen the results other people get when they don’t put in that much effort to using the LLMs well, and the results are either very generic, or in various ways just plain bad.
I suspect, though I don’t know, that the ceiling of what results a skilled user can achieve using a frontier LLM is probably higher than what most dedicated healthcare-focused tools can do, but the floor is very likely to be much, much worse.
This may or may not be useful, but in terms of training, I have a few phrases and framings I keep repeating to people that seem to resonate. I don’t remember where I first heard them.
“Think of AI as an infinite army of interns. Five years ago they were in elementary school. Now they’ve graduated college. But they have no practical experience and only have the context you explicitly give them. Reasoning models and Deep Research/Extended Thinking are like giving the interns more time to think with no feedback from you.”
“Don’t assume what you were doing and seeing with AI 3+ months ago has any bearing on what AI can or will do today. Models and capabilities change, even with the same designation.”
“You’re not talking to a person. Your prompt is telling the AI who to pretend to be as well as what kind of conversation it is having.” (In the personal contexts mentioned above, I would include things at the start of a prompt like “You are a metabolic specialist with 20 years of clinical and research experience” or “You are a feline ethologist.”)
Related: “If your prompt reads like a drunken text, the response will be the kind of response likely to be the reply to a drunken text.”
Also related: “Your system prompt tells the AI who you are, and how you want it to behave. This can be as detailed as you need, and should probably be paragraphs or pages long.” (Give examples)
“The big AI companies are all terrible at making their models into useful products. Never assume they’ve put reasonable effort into making features work the way you’d expect, or that they’ve fixed bugs that would require a few minutes of coding effort to fix.”
For some things it will. But for some things—tools coded as ‘research support’ or ‘point of care reference tools’ or more generally, an information resource—it’s up to the library, just like we make the decisions about what journals we’re subscribed to. I gather that before I started, there used to be more in the way of meaningful consultation with people in other roles—but as our staffing has been axed, these sorts of outreach relationships have fallen to the wayside.
It would be great if that were a reasonable assumption. Every one I’ve evaluated so far has turned out to be some kind of ChatGPT with a medical-academic research bow on it. Some of them are restricted to a walled garden of trusted medical sources instead of having the internet.
Part of the message I think I oughta promote is that we should hold out for something specific. The issue is that when it comes to research, it really is up to people what they use—there’s no real oversight and there’s not regulations to stop them like if they were actually provably putting patient info in there. But they’re still going to be bringing what they “learn” into practice, as well as polluting the commons (since we know at this point that peer review doesn’t do much and it’s mostly peoples’ academic integrity keeping it all from falling apart).
Part of what these companies with their GPTs are trying to sell themselves as being able to replace is the exact sort of checks and balances that stops the whole medical research commons being nothing but bullshit—critical appraisal and evidence synthesis.
thats about what i thought yeah
Thank you for the phrases, they seem useful.