At FLF one of our initiatives we’re recruiting for is an ‘epistemic stack’, which I think fits the bill as a backend/foundation for many of the desiderata you’re describing. LLM chat interface would be one UX form factor on top.
Epistemic stack would be a (probably distributed) cache of annotations and metadata connecting claims to supporting sources, constructable and expandable dynamically. The cheapest and widest-coverage construction would use LM-based agents over webtext, inferring support from existing links, citations, and sometimes proactive web search for supporting (or contradictory) sources. Human participants (authors and readers) would be able to provide various annotations, including endorsement of (or alterations to) inferred epistemic support links. Something like git, versioned, signed, annotated DAGs would then be available to downstream epistemic applications (including via RAG for LM consumption, but also many other imaginable formats).
MVP on-demand tree construction for a given claim is already practical, though unreliable and more expensive than a system with caching would be.
Down the line, if more verifiable sources of ground data (signed cameras, etc.) get more widespread, such data would readily integrate as leaves.
Compare also Society Library which has some similar prior work (mostly manual), and may be moving in a similar direction.
There has also been related discussion of ‘wikipedia for LLMs’, and though I haven’t heard more technical refinement from such proponents, that term might be intended to expand to a similar overall concept.
Note that ‘wikipedia by LLM’, like grokipedia, does not currently have any meaningful claim to epistemic grounding, correctability, or transparency/legibility, though its form factor would at least inherit the navigability of wikipedia.
At FLF one of our initiatives we’re recruiting for is an ‘epistemic stack’, which I think fits the bill as a backend/foundation for many of the desiderata you’re describing. LLM chat interface would be one UX form factor on top.
Epistemic stack would be a (probably distributed) cache of annotations and metadata connecting claims to supporting sources, constructable and expandable dynamically. The cheapest and widest-coverage construction would use LM-based agents over webtext, inferring support from existing links, citations, and sometimes proactive web search for supporting (or contradictory) sources. Human participants (authors and readers) would be able to provide various annotations, including endorsement of (or alterations to) inferred epistemic support links. Something like git, versioned, signed, annotated DAGs would then be available to downstream epistemic applications (including via RAG for LM consumption, but also many other imaginable formats).
MVP on-demand tree construction for a given claim is already practical, though unreliable and more expensive than a system with caching would be.
Down the line, if more verifiable sources of ground data (signed cameras, etc.) get more widespread, such data would readily integrate as leaves.
Compare also Society Library which has some similar prior work (mostly manual), and may be moving in a similar direction.
There has also been related discussion of ‘wikipedia for LLMs’, and though I haven’t heard more technical refinement from such proponents, that term might be intended to expand to a similar overall concept.
Note that ‘wikipedia by LLM’, like grokipedia, does not currently have any meaningful claim to epistemic grounding, correctability, or transparency/legibility, though its form factor would at least inherit the navigability of wikipedia.
Note that we’re aware of the cautionary tales of Cyc, Xanadu, and Arbital!
We’re hoping a combination of:
more minimalistic
more open and interfaceable
semantically sensitive software components (LMs)
cheaply available clerical labour (LMs)
means that ‘this time is different’.