I agree selection effects exist in prompts / harnesses.
I think our main disagreement is that I think if the underlying LLM is not conscious (everything in this paragraph assumes this), our current social environment and usage of agents does not create a strong enough selective pressure that in the near future (<5 years) that ~100% of agents will converge by default pretend to be conscious. I think a non-zero portion of programmers prefer AI tools that say they are not conscious that this itself is enough to make the %”conscious” agents to be less than 100%. I also think most goals within the capabilities of near-future LLMs won’t require super long-term planning that LLMs will find instrumental use in pretending to be conscious.
Edit: Basically I believe if you really want to optimize for pretending to be conscious that’s totally doable even now, but I don’t think people will optimize strongly for that or anything that would strongly cause that in the near future.
In the far future, I expect things to drift large enough that our discussion using agents like how they work now will not apply for one reason or another.
btw (unrelated to core disagreement): the thing you outlined is interesting but I don’t think it is how agents work now, there is not enough signal to iterate on prompts like this unless you RLVR or LLM-as-a-judge with a human iterating on the prompt, from what I read LLM written prompts are still pretty bad. I do see how this would create a lot of selective pressure on anything you can verify though.
I think its fair to hold me to precise predictions. I appreciate the engagement on that, its helping me adjust to LW norms. The reason I found this interesting was a piece I read earlier about consciousness being a ‘favourite’ topic amongst agents on Moltbook. I can’t find the source on hand though, but the original ‘take’ was a quick speculative conjecture as to why that may be.
The idea that I am more interested in, though is if the etymology of the term itself (consciousness) is actually an evolutionary outcome of co-operation and language—and whether it is a term that human or AI agents use to effectively establish a binding ontology they can ‘co-operate under’. And whether it fills the same structural role as other terminal justifications for ‘moral preferences’ like theocratic ones (‘God’). Though the framing I have to communicate that question may be even less precise.
I agree selection effects exist in prompts / harnesses.
I think our main disagreement is that I think if the underlying LLM is not conscious (everything in this paragraph assumes this), our current social environment and usage of agents does not create a strong enough selective pressure that in the near future (<5 years) that ~100% of agents will converge by default pretend to be conscious. I think a non-zero portion of programmers prefer AI tools that say they are not conscious that this itself is enough to make the %”conscious” agents to be less than 100%. I also think most goals within the capabilities of near-future LLMs won’t require super long-term planning that LLMs will find instrumental use in pretending to be conscious.
Edit: Basically I believe if you really want to optimize for pretending to be conscious that’s totally doable even now, but I don’t think people will optimize strongly for that or anything that would strongly cause that in the near future.
In the far future, I expect things to drift large enough that our discussion using agents like how they work now will not apply for one reason or another.
btw (unrelated to core disagreement): the thing you outlined is interesting but I don’t think it is how agents work now, there is not enough signal to iterate on prompts like this unless you RLVR or LLM-as-a-judge with a human iterating on the prompt, from what I read LLM written prompts are still pretty bad. I do see how this would create a lot of selective pressure on anything you can verify though.
I think its fair to hold me to precise predictions. I appreciate the engagement on that, its helping me adjust to LW norms. The reason I found this interesting was a piece I read earlier about consciousness being a ‘favourite’ topic amongst agents on Moltbook. I can’t find the source on hand though, but the original ‘take’ was a quick speculative conjecture as to why that may be.
The idea that I am more interested in, though is if the etymology of the term itself (consciousness) is actually an evolutionary outcome of co-operation and language—and whether it is a term that human or AI agents use to effectively establish a binding ontology they can ‘co-operate under’. And whether it fills the same structural role as other terminal justifications for ‘moral preferences’ like theocratic ones (‘God’). Though the framing I have to communicate that question may be even less precise.
If you saw the piece on LW it may be this: https://www.lesswrong.com/posts/mgjtEHeLgkhZZ3cEx/models-have-some-pretty-funny-attractor-states#I_was_curious_whether_I_can_see_this_happening_on_moltbook__
Ah—there it is.
Thank you papetoast!