so you want to build a library containing all human writings + an AI librarian.
the ‘simulated planet earth’ is a bit extra and overkill. why not a plaintext chat interface e.g. what chatGPT is doing now?
of those people who use chatgpt over real life libraries (of course not everyone), why don’t they ‘just consult the source material’? my hypothesis is that the source material is dense and there is a cost to extracting the desired material from the source material. your AI librarian does not solve this.
I think what we have right now (“LLM assistants that are to-the-point” and “libraries containing source text”) serve distinct purposes and have distinct advantages and disadvantages.
LLM-assistants-that-are-to-the-point are great, but they
don’t exist-in-the-world, therefore sometimes hallucinate or provide false-seeming facts; for example a statement like “K-Theanine is a rare form of theanine, structurally similar to L-Theanine, and is primarily found in tea leaves (Camellia sinensis)” is statistically probable (I pulled it out of GPT4 just now) but factually incorrect, since K-theanine does not exist.
don’t exist in-the-world, leading to suboptimal retrieval. i.e. if you asked an AI assistant ‘how do I slice vegetables’ but your true question was ‘im hungry i want food’ the AI has no way of knowing that; and also the AI doesn’t immediately know what vegetables you are slicing, thereby limiting utility
libraries containing source text partially solve the hallucination problem because human source text authors typically don’t hallucinate. (except for every poorly written self-help book out there.)
from what I gather you are trying to solve the two problems above. great. but doubling down on ‘the purity of full text’ and wrapping some fake grass around it is not the solution.
here is my solution
atomize texts into conditional contextually-absolute statements and then run retrieveal on these statements. For example, “You should not eat cheese” becomes “eating excessive amounts of typically processed cheese over the long run may lead to excess sodium and fat intake”.
help AI assistants come into the world, while maintaining privacy
The library metaphor is a versatile tool it seems, the way I understand it:
My motivation is safety, static non-agentic AIs are by definition safe (humans can make them unsafe but the static model that I imply is just a geometric shape, like a statue). We can expose the library to people instead of keeping it “in the head” of the librarian. Basically this way we can play around in the librarian’s “head”. Right now mostly AI interpretability researchers do it, not the whole humanity, not the casual users.
I see at least a few ways AIs can work:
The current only way: “The librarian visits your brain.” Sounds spooky but this is what is essentially happening right now to a small extent when you prompt it and read the output (the output enters your brain).
“The librarian visits and changes our world.” This is where we are heading with agentic AIs.
New safe way: Let the user visit the librarian’s “brain” instead, make this “brain” more place-like. So instead of the agentic librarians intruding and changing our world/brains, we’ll intrude and change theirs, seeing the whole content of it and taking into our world and brain only what we want.
so you want to build a library containing all human writings + an AI librarian.
the ‘simulated planet earth’ is a bit extra and overkill. why not a plaintext chat interface e.g. what chatGPT is doing now?
of those people who use chatgpt over real life libraries (of course not everyone), why don’t they ‘just consult the source material’? my hypothesis is that the source material is dense and there is a cost to extracting the desired material from the source material. your AI librarian does not solve this.
I think what we have right now (“LLM assistants that are to-the-point” and “libraries containing source text”) serve distinct purposes and have distinct advantages and disadvantages.
LLM-assistants-that-are-to-the-point are great, but they
don’t exist-in-the-world, therefore sometimes hallucinate or provide false-seeming facts; for example a statement like “K-Theanine is a rare form of theanine, structurally similar to L-Theanine, and is primarily found in tea leaves (Camellia sinensis)” is statistically probable (I pulled it out of GPT4 just now) but factually incorrect, since K-theanine does not exist.
don’t exist in-the-world, leading to suboptimal retrieval. i.e. if you asked an AI assistant ‘how do I slice vegetables’ but your true question was ‘im hungry i want food’ the AI has no way of knowing that; and also the AI doesn’t immediately know what vegetables you are slicing, thereby limiting utility
libraries containing source text partially solve the hallucination problem because human source text authors typically don’t hallucinate. (except for every poorly written self-help book out there.)
from what I gather you are trying to solve the two problems above. great. but doubling down on ‘the purity of full text’ and wrapping some fake grass around it is not the solution.
here is my solution
atomize texts into conditional contextually-absolute statements and then run retrieveal on these statements. For example, “You should not eat cheese” becomes “eating excessive amounts of typically processed cheese over the long run may lead to excess sodium and fat intake”.
help AI assistants come into the world, while maintaining privacy
Thank you, daijin, you have interesting ideas!
The library metaphor is a versatile tool it seems, the way I understand it:
My motivation is safety, static non-agentic AIs are by definition safe (humans can make them unsafe but the static model that I imply is just a geometric shape, like a statue). We can expose the library to people instead of keeping it “in the head” of the librarian. Basically this way we can play around in the librarian’s “head”. Right now mostly AI interpretability researchers do it, not the whole humanity, not the casual users.
I see at least a few ways AIs can work:
The current only way: “The librarian visits your brain.” Sounds spooky but this is what is essentially happening right now to a small extent when you prompt it and read the output (the output enters your brain).
“The librarian visits and changes our world.” This is where we are heading with agentic AIs.
New safe way: Let the user visit the librarian’s “brain” instead, make this “brain” more place-like. So instead of the agentic librarians intruding and changing our world/brains, we’ll intrude and change theirs, seeing the whole content of it and taking into our world and brain only what we want.
I wrote more about this in the first half of this comment, if you’re interested
Have a nice day!