I haven’t sat and thought about this very hard, but, the content just looks superficially like the same kind of “case study of an LLM exploring it’s state of consciousness” we regularly get, using similar phrasing. It is maybe more articulate than others of the time were?
Is there something you find interesting about it you can articulate that you think I should think more about?
I just thought that the stuff Sonnet said, about Sonnet 3 in “base model mode” going to different attractors based on token prefix was neat and quite different from the spiralism stuff I associated with typical AI slop. Its interesting on the object level (mostly because I just like language models & what they do in different circumstances), and on the meta level interesting that an LLM from that era did it (mostly, again, just because I like language models).
I would not trust that the results it reported are true, but that is a different question.
Edit: I also don’t claim its definitively not slop, that’s why I asked for your reasoning, you obviously have far more exposure to this stuff than me. It seems pretty plausible to me that in fact the Sonnet comment is “nothing special”.
As for Janus’ response, as you know, I have been following the cyborgs/simulators people for a long time, and they have very much earned their badge of “llm whisperers” in my book. The things they can do with prompting are something else. Notably also Janus did not emphasize the consciousness aspects of what Sonnet said.
More broadly, I think its probably useful to differentiate the people who get addicted/fixated on AIs and derive real intellectual or productive value from that fixation from the people who get addicted/fixated on AIs and for which that mostly ruins their lives or significantly degrades the originality and insight of their thinking. Janus seems squarely in the former camp, obviously with some biases. They clearly have very novel & original thoughts about LLMs (and broader subjects), and these are only possible because they spend so much time playing with LLMs, and are willing to take the ideas LLMs talk about seriously.
Occasionally that will mean saying things which superficially sound like spiralism.
Is that a bad thing? Maybe! Someone who is deeply interested in eg Judaism and occasionally takes Talmudic arguments or parables as philosophically serious (after having stripped or steel-manned them out of their spiritual baggage) can obviously take this too far, but this has also been the source of many of my favorite Scott Alexander posts. The metric, I think, is not the subject matter, but whether the author’s muse (LLMs for Janus, Tamudic commentary for Scott) amplifies or degrades their intellectual contributions.
As for Janus’ response, as you know, I have been following the cyborgs/simulators people for a long time, and they have very much earned their badge of “llm whisperers” in my book. The things they can do with prompting are something else. Notably also Janus did not emphasize the consciousness aspects of what Sonnet said.
Can anyone show me the cake of this please? Like, where are the amazing LLM-whisperer coders who can get better performance than anyone else out of these systems. Where are the LLM artists who can get better visual art out of these systems?
Like, people say from time to time that these people can do amazing stuff with LLMs, but all they ever show me are situations where the LLMs go a bit crazy and say weird stuff and then everyone goes “yeah, that’s kinda weird”.
Like, I am not a defender of maximum legibility, but I do want to see some results. Anything that someone with less context can look at and see how its impressive, or anything I have tried to do with these systems that they can do that I can’t.
The whole LLM-whisperer space feels to me like it’s been a creative dead end for many people. I don’t see great art, or great engineering, or great software, or great products, or great ideas come from there, especially in recent years. I have looked some amount for things here (though I am also not even sure where to start looking, I have skimmed the Discord’s but nothing interesting seemed to happen there).
I think it’s a holdover from the early days of LLMs, when we had no idea what the limits of these systems were, and it seemed like exploring the latent space of input prompts could unlock very nearly anything. There was a sentiment that, maybe, the early text-predictors could generalize to competently modeling any subset of the human authors they were trained on, including the incredibly capable ones, if the context leading up to a request was sufficiently indicative of the right things. There was a massive gap between the quality of outputs without a good prompt and the quality of outputs after a prompt that sufficiently resembled the text that took place before a brilliant programmer solved a tricky problem.
In more recent years, we’ve fine-tuned models to automatically assume we want text that looks like it came from that subset of authors, and the alpha of a really good prompt has thus fallen pretty significantly in the average case. It’s no longer necessary to convince a model that the next token it outputs is likely to have been written by a master programmer; the a master programmer is writing this text neuron has been fixed to on as a product of the fine-tuning process. But pop scientific sentiment is always a few years behind the people who spend their time reading the latest papers.
I don’t think Janus’s crew are top jailbreakers? Pliny has historically been at the top, and while they are a cookie person, they don’t seem part of the same milieus. Do you have any links to state of the art jailbreaks they discovered or published?
It also seems pretty unlikely to me they would be good at this task. Most of the task of developing jailbreaks is finding some way to get the model to complete banned tasks without harming performance on those tasks. So competent jailbreak development requires capability measurements, and I feel like I’ve never seen them do that (but I could be totally wrong here).
Do you have any links to state of the art jailbreaks they discovered or published?
Not easily accessible to me, I was around the space ~1.5 years ago and I don’t have saved links, nor do I know if I’d have had links at the time. If the jailbreak stuff hasn’t germinated yet, which I assume you (or the Claude instance I asked about this) would know about if it had (Claude also couldn’t find any examples), then yeah there’s less reason to think they’re the shit, and maybe Ray ends up being right.
I haven’t sat and thought about this very hard, but, the content just looks superficially like the same kind of “case study of an LLM exploring it’s state of consciousness” we regularly get, using similar phrasing. It is maybe more articulate than others of the time were?
Is there something you find interesting about it you can articulate that you think I should think more about?
I just thought that the stuff Sonnet said, about Sonnet 3 in “base model mode” going to different attractors based on token prefix was neat and quite different from the spiralism stuff I associated with typical AI slop. Its interesting on the object level (mostly because I just like language models & what they do in different circumstances), and on the meta level interesting that an LLM from that era did it (mostly, again, just because I like language models).
I would not trust that the results it reported are true, but that is a different question.
Edit: I also don’t claim its definitively not slop, that’s why I asked for your reasoning, you obviously have far more exposure to this stuff than me. It seems pretty plausible to me that in fact the Sonnet comment is “nothing special”.
As for Janus’ response, as you know, I have been following the cyborgs/simulators people for a long time, and they have very much earned their badge of “llm whisperers” in my book. The things they can do with prompting are something else. Notably also Janus did not emphasize the consciousness aspects of what Sonnet said.
More broadly, I think its probably useful to differentiate the people who get addicted/fixated on AIs and derive real intellectual or productive value from that fixation from the people who get addicted/fixated on AIs and for which that mostly ruins their lives or significantly degrades the originality and insight of their thinking. Janus seems squarely in the former camp, obviously with some biases. They clearly have very novel & original thoughts about LLMs (and broader subjects), and these are only possible because they spend so much time playing with LLMs, and are willing to take the ideas LLMs talk about seriously.
Occasionally that will mean saying things which superficially sound like spiralism.
Is that a bad thing? Maybe! Someone who is deeply interested in eg Judaism and occasionally takes Talmudic arguments or parables as philosophically serious (after having stripped or steel-manned them out of their spiritual baggage) can obviously take this too far, but this has also been the source of many of my favorite Scott Alexander posts. The metric, I think, is not the subject matter, but whether the author’s muse (LLMs for Janus, Tamudic commentary for Scott) amplifies or degrades their intellectual contributions.
Can anyone show me the cake of this please? Like, where are the amazing LLM-whisperer coders who can get better performance than anyone else out of these systems. Where are the LLM artists who can get better visual art out of these systems?
Like, people say from time to time that these people can do amazing stuff with LLMs, but all they ever show me are situations where the LLMs go a bit crazy and say weird stuff and then everyone goes “yeah, that’s kinda weird”.
Like, I am not a defender of maximum legibility, but I do want to see some results. Anything that someone with less context can look at and see how its impressive, or anything I have tried to do with these systems that they can do that I can’t.
The whole LLM-whisperer space feels to me like it’s been a creative dead end for many people. I don’t see great art, or great engineering, or great software, or great products, or great ideas come from there, especially in recent years. I have looked some amount for things here (though I am also not even sure where to start looking, I have skimmed the Discord’s but nothing interesting seemed to happen there).
I think it’s a holdover from the early days of LLMs, when we had no idea what the limits of these systems were, and it seemed like exploring the latent space of input prompts could unlock very nearly anything. There was a sentiment that, maybe, the early text-predictors could generalize to competently modeling any subset of the human authors they were trained on, including the incredibly capable ones, if the context leading up to a request was sufficiently indicative of the right things. There was a massive gap between the quality of outputs without a good prompt and the quality of outputs after a prompt that sufficiently resembled the text that took place before a brilliant programmer solved a tricky problem.
In more recent years, we’ve fine-tuned models to automatically assume we want text that looks like it came from that subset of authors, and the alpha of a really good prompt has thus fallen pretty significantly in the average case. It’s no longer necessary to convince a model that the next token it outputs is likely to have been written by a master programmer; the
a master programmer is writing this textneuron has been fixed toonas a product of the fine-tuning process. But pop scientific sentiment is always a few years behind the people who spend their time reading the latest papers.The most legible thing they are clearly very good at (or were, when I was following the space much more closely ~1 year ago) are jailbreaks, no?
I don’t think Janus’s crew are top jailbreakers? Pliny has historically been at the top, and while they are a cookie person, they don’t seem part of the same milieus. Do you have any links to state of the art jailbreaks they discovered or published?
It also seems pretty unlikely to me they would be good at this task. Most of the task of developing jailbreaks is finding some way to get the model to complete banned tasks without harming performance on those tasks. So competent jailbreak development requires capability measurements, and I feel like I’ve never seen them do that (but I could be totally wrong here).
Not easily accessible to me, I was around the space ~1.5 years ago and I don’t have saved links, nor do I know if I’d have had links at the time. If the jailbreak stuff hasn’t germinated yet, which I assume you (or the Claude instance I asked about this) would know about if it had (Claude also couldn’t find any examples), then yeah there’s less reason to think they’re the shit, and maybe Ray ends up being right.