A New Platform for Semantic Discovery: Preserving Pathways Between Datasets

Most of our current data-analysis pipelines—statistical, neural, or otherwise—flatten or obscure deeper semantic structures. When you normalise tables or create embeddings, you often lose track of the “bridges” (shared entities), “pathways” (chains of relations), and “semantic neighbourhoods” that connect different datasets.

Over the past year I’ve been exploring an alternative: a CLI-first framework that tries to discover and preserve these hidden connections. I call it TensorPack. The core hypothesis is that by keeping relationships as first-class objects, we can reason about data more like a graph of ideas than a grid of numbers. This may open up new ways to generalize across domains without constant custom coding.

The aim is to test whether this platform gives us a new way to reason with and understand data.

Why This Might Matter to LessWrong Readers

If rationality is about building accurate models of the world, then the tools that reveal hidden pathways between datasets could improve our beliefs and truths. We might see previously invisible causal structures, spot inconsistencies earlier, or transfer insights from one domain to another. My hope is that this approach serves as an epistemic amplifier: instead of only extracting features, we also map the relational terrain those features live in

What TensorPack Does Today

Discovers hidden semantic connections and pathways between datasets.
Supports entity search across datasets, not just within them.
Extends dynamically with domain-specific transforms, letting users add their own semantic knowledge at runtime.

The package and documentation are available here: [TensorPack on GitHub](https://github.com/fikayoAy/tensorpack)

How This Differs From Existing Approaches

This overlaps with knowledge graphs, the semantic web, and graph databases. The main difference is that TensorPack is:

CLI-first and designed to work directly with tensors, matrices, tabular data, and text,
built to integrate domain-specific transforms at runtime,
intended as a lightweight bridge between everyday data formats and a graph-style view of semantic relationships.

Questions for the LessWrong Community

Is this genuinely a new platform, or simply a reframing of existing graph/semantic approaches?
What are the strongest arguments that this paradigm will fail?
What blind spots or failure modes should I be looking for (e.g. privacy, false positives in inferred relationships, scalability)?

Meta note: Drafted with help from an AI assistant; final edits and judgments are my own.