Robert Kralisch

Karma: 196

I am Robert Kralisch, one of the organizers for the AI Safety Camp, working as a research coordinator by evaluating and supporting research projects that fit under the umbrella of “technical AI Safety research” and “conceptually sound approaches to AI Alignment”.

I’m also an independent conceptual/theoretical Alignment Researcher. I have a background in Cognitive Science and I am interested in collaborating on an end-to-end strategy for AGI alignment.

The three main branches that I aim to contribute to are conceptual clarity (what should we mean by agency, intelligence, embodiment, etc), the exploration of more inherently interpretable cognitive architectures, and Simulator theory.

One of my concrete goals is to figure out how to design a cognitively powerful agent such that it does not become a Superoptimiser in the limit.

AI Safety Camp 11

Robert Kralisch and Remmelt

4 Nov 2025 14:56 UTC

6 points

0 comments15 min readLW link

Invitation to lead a project at AI Safety Camp (Virtual Edition, 2026)

Robert Kralisch and Remmelt

6 Sep 2025 13:17 UTC

7 points

0 comments4 min readLW link

AI Safety Camp 10 Outputs

Robert Kralisch, Remmelt and Linda Linsefors

5 Sep 2025 8:27 UTC

23 points

1 comment17 min readLW link

We don’t want to post again “This might be the last AI Safety Camp”

Remmelt, Linda Linsefors and Robert Kralisch

21 Jan 2025 12:03 UTC

36 points

17 comments1 min readLW link

(manifund.org)

Funding Case: AI Safety Camp 11

Remmelt, Robert Kralisch and Linda Linsefors

23 Dec 2024 8:51 UTC

60 points

4 comments6 min readLW link

(manifund.org)

AI Safety Camp 10

Robert Kralisch, Linda Linsefors and Remmelt

26 Oct 2024 11:08 UTC

38 points

9 comments18 min readLW link

Invitation to lead a project at AI Safety Camp (Virtual Edition, 2025)

Linda Linsefors, Remmelt Ellen and Robert Kralisch

23 Aug 2024 14:18 UTC

17 points

2 comments4 min readLW link

Research Discussion on PSCA with Claude Sonnet 3.5

Robert Kralisch24 Jul 2024 16:53 UTC

−2 points

0 comments25 min readLW link

Robert Kralisch 3 May 2024 0:27 UTC
1 point
0
in reply to: cheer Poasting’s comment on: Referential Containment
Yeah, I am not super familiar with PCA, but my understanding is that while both PCA and referential containment can be used to extract lower-dimensional or more compact representations, they operate on different types of data structures (feature vectors vs. graphs/hypergraphs) and have different objectives (capturing maximum variance vs. identifying self-contained conceptual chunks). Referential containment is more focused on finding semantically meaningful and contextually relevant substructures within a causal or relational knowledge representation. It also tries to address the opposite direction, basically how to break existing concepts apart when zooming into the representations, and I am not sure if something like that is done with PCA.

I had Claude 3 read this post and compare the two. Here it is, if you are interested (keep in mind that Claude tends to be very friendly and encouraging, so it might be valuing referential containment too highly):
Similarities:
- Both aim to identify patterns and structure in high-dimensional data.
- Both can be used to reduce the dimensionality of data by finding lower-dimensional representations.
- Both rely on analyzing statistical dependencies or correlations between variables/features.
Differences:
1. PCA is a linear dimensionality reduction technique that projects data onto a lower-dimensional subspace spanned by the principal components (eigenvectors of the covariance matrix). Referential containment is not inherently a dimensionality reduction technique but rather a way to identify self-contained or tightly coupled substructures within a causal graph or hypergraph representation.
2. PCA operates on numerical feature vectors, while referential containment deals with graph/hypergraph structures representing conceptual relationships and causal dependencies.
3. PCA finds orthogonal linear combinations of features that capture maximum variance. Referential containment aims to identify subgraphs with strong internal connections and minimal external connections, which may not necessarily correspond to maximum variance directions.
4. PCA is an unsupervised technique that does not consider semantics or labeled data. Referential containment, as proposed in the context of the PSCA architecture, can potentially leverage semantic information and learned representations to guide the identification of meaningful conceptual chunks.
5. PCA produces a fixed set of principal components for a given dataset. Referential containment, as described, is more flexible and can adapt the level of granularity or resolution based on context, objectives, and resource constraints.

Robert Kralisch 3 May 2024 0:20 UTC
2 points
0
in reply to: cubefox’s comment on: Disentangling Competence and Intelligence
I think that “epistemic rationality” matches very well with what I am thinking of as level 3, which is my notion of intelligence. It is indeed applicable to non-agentic systems.
I am still thinking about whether to include meta-learning (referring to updating level 3 algorithms based on experience) and meta-processes above that in my concept of intelligence.

Would this layer of meta-learning be part of epistemic rationality, do you think? It becomes particularly relevant if the system is resource constrained and has to prioritize what to learn about, and/or cares about efficiency of learning. These constraints feel a bit less natural to introduce for a non-agentic system, other than if said system is set up by an agentic system for some purpose.

In any case, instrumental rationality does not seem to cover all that I mean with Competence, perhaps something narrower, like “cognitive competence”. I find it a bit difficult to systematically distinguish between cognitive competence and noncognitive competence, because the cognitive part of the system is also implemented by its embodiment—and there are various correlations between events in “morphospace” and “cognitive space”.
One way of resolving that might be to distinguish between on-surface properties of a markov blanket (corresponding to the system interfacing with its environment) and within-surface properties of that markov blanket (corresponding to integrated regulation systems and the cognition). There will still be feedback loops between those properties, so our mileage on obtaining clean distinctions into different competencies may vary.
If you are interested in some more thoughts on that, you can check out my post on extended embodiment.
An independent idea is that it is apparently possible to divide “competence” or “instrumental rationality” into two independent axes: Generality, and intelligence proper (or perhaps: competence proper).
I have already commented a bit on meta-learning above, per default my level 3 would refer just to online learning, but I am thinking of including different levels of meta-learning because of the algorithmic similarities.
Perhaps interestingly for you, I consider one of the primary purposes of meta-learning to refine a generally intelligent system into a more narrowly intelligent system, by improving its learning capabilities for a particular set of domains, in some sense biasing the cognition towards the kind of environment it seems to operate within (i.e. in terms of hypothesis generation, or which kinds of functions to use when approximating the behavior of an observed sequence).
Of course, unless the system loses its meta-learning capability, it will be able to respond to changes in its environment by re-aiming/updating its learning tendencies over time—so it is technically general if you give it some time, but ends up converging towards beneficial specialisation.

I think of generality of intelligence as relatively conceptually trivial. At the end of the day, a system is given a sequence of data via observation, and is now tasked with finding a function or set of functions that both corresponds to plausible transition rules of the given sequence, and has a reasonably high chance of correctly predicting the next element of the sequence (which is easy to train for by hiding later elements of the sequence from the modeling process and sequentially introducing them to test and potentially update the fit of the model).
Computationally speaking, the set of total atomic functions that you would have to consider in order to be able to compositionally construct arbitrary transition rules for sequences of discrete data packages, is very small. The only mathematical requirement is Turing universality—basically the entire difficulty arises due to resource constraints.

This seems to match with your thoughts about the appearance of greater generality simply due to more processing power. A cognitive system that is provided with more processing power could use that either to search more deeply those regions of causal models that it naturally considers, or it could branch out to consider new regions within model-space. Many brains in the animal kingdom seem to implement a sort of limited generative simulation of their environment, so that could be considered as a fairly general problem domain.

I could try to write more on this, but I am curious what you think about this so far and if I come across as reasonably clear.

Robert Kralisch 30 Apr 2024 0:10 UTC
1 point
0
in reply to: Seth Herd’s comment on: The Prop-room and Stage Cognitive Architecture
Thanks a lot for the encouragement :)
Yes, I am trying to understand a generalized (which also means simplified) and formalizable parallel to human cognition. Some of my thinking on this is inspired by predictive coding and adaptive resonance theory (although prettly loosely by the latter), and I am trying to figure out the implications of our most updated understanding of neurobiological principles, together with a notion of the “riverbeds of cognition”.

In other words, how can we design an architecture such that it is not pressured to take shortcuts or “work around” design decisions we made, as its cognition develops? Is there a “natural path” of cognitive development that avoids some of the common pitfalls and failure modes (i.e. can we aim inner alignment if we have proficiency in this area)?
This has a direct bearing on interpretability, and goes together with the goal of a sort of “conceptual curriculum” that is intended to teach the system natural abstractions.

If I remember correctly, the centrality of “constraint satisfaction” fell out of considering causal (hyper/meta)graphs as sensible representational substrate (which was partially inspired by Ben Goertzel). I personally find it quite intuitive to think in graphs.

Robert Kralisch 29 Apr 2024 23:49 UTC
1 point
0
in reply to: cheer Poasting’s comment on: Referential Containment
Hm, I’ll give some thought to how to integrate different types of data with this picture, but I think that the “useful” classification of data ultimately depends on whether the agent possesses the right “key” to interpret it, and by extension, how difficult that “key” is to produce from concepts that the agent is already proficient with.

At the end of the day, the agent can only “understand” any data in terms of internalized concepts, so there will often be some uncertainty whether the difficulty is in translating sensible data into that internal representation or the difficulty is in the data being about phenomena somewhat (or far) outside of conceptual bounds. Having a “key” with respect to some data means that this data can be reliably translated.

Referential containment is about how to structure internal representations of concepts such as to make them most useful (which could mean flexible, or it could mean efficient, etc, depending on the problem domain and constraints).

If humanity forgot all of its medical knowledge tomorrow, would we discover the same categories and sub-categories of medicine, structuring our knowledge and drawing the distinctions into the different areas of expertise similarly?
We could gather a bunch of observational data about “the art of preventing people from dying” and find clusters in interventions or treatment strategies that have appropriate conceptual size to teach to different groups of people. Note that this changes depending on how many people we can allocate in total, how much we believe reliably fits into a single human mind, and how many common features there are in curative or preventative measures (commonalities here roughly referring to useful information that is referentially contained with respect to medicine, but can not be further contained in a single sub-field or small cluster thereof).

Let me know if that makes sense.

Robert Kralisch 29 Apr 2024 23:30 UTC
3 points
2
in reply to: Nathan Helm-Burger’s comment on: Disentangling Competence and Intelligence
Yeah, I wish we had some cleaner terminology for that.
Finetuning the “simulation engine” towards a particular task at hand (i.e. to find the best trade-off between breadth and depth search in strategy games, or even know how much “thinking time” or “error allowance” to allocate to a move), given limited cognitive resources, is something that I would associate with level 3 capability.
It certainly seems like learning could go into the direction of making the model of the game more useful by either improving the extent to which this model predicts/ouputs good moves or by improving the allocation of cognitive resources to the sub-tasks involved. Presumably, an intelligent system should be capable of testing which improvement vectors seem most fruitful (and the frequency with which to update this analysis), but I find myself a bit confused about whether that should count as level 3 or as level 4, since the system is reasoning about allocating resources across relevant learning processes.

Robert Kralisch 29 Apr 2024 23:17 UTC
1 point
0
in reply to: Seth Herd’s comment on: Disentangling Competence and Intelligence
Thanks!

In your example, I think it is possible that the hunter-gatherer solves the problem through pure level 2 capability, even if they never encountered this specific problem before. Using causal models compositionally to represent the current scene, and computing it to output a novel solution, does not actually require that the human updates their causal models about the world.
I am trying to distinguish agents with this sort of compositional world model from ones that just have a bunch of cashed thoughts or habits (which would correspond to level 1), and I think this is perhaps a common case where people would attribute intelligence to a system that imo does not demonstrate level 3 capability.

Of course, this would require that the human in our example already has some sufficiently decontextualised notion of knocking loose objects down, or that generally their concepts are suited to this sort of compositional reasoning. It might be worth elaborating on level 2 to introduce some measure modeling flexibility/compositionality.
I feel like this could be explained better, so I am curious if you think I am being clear.

You are probably right that I should avoid the term intelligence for the time being, but I haven’t quite found an alternative term that resonates. Anyways, thanks for engaging!
Edit: I’ll soon make some changes to the post to better account for this feature of level 2 algorithms to potentially solve novel problems even if no new learning occurs. It’s an important aspect of why I am saying that level 3 capabilities are only indirectly related to competence.

The Prop-room and Stage Cognitive Architecture

Robert Kralisch29 Apr 2024 0:48 UTC

14 points

4 comments14 min readLW link

How are Simulators and Agents related?

Robert Kralisch29 Apr 2024 0:22 UTC

6 points

0 comments7 min readLW link

Extended Embodiment

Robert Kralisch29 Apr 2024 0:18 UTC

8 points

1 comment3 min readLW link

Referential Containment

Robert Kralisch29 Apr 2024 0:16 UTC

2 points

4 comments3 min readLW link

Disentangling Competence and Intelligence

Robert Kralisch29 Apr 2024 0:12 UTC

23 points

7 comments6 min readLW link

Robert Kralisch 24 Apr 2024 19:30 UTC
2 points
0
on: Planning in a Lattice Graph
the maximum plan length is only $10 \cdot 100 = 1000$ steps
You mean the maximum length for an efficient/minimal plan, right? Maybe good to clarify (even if obvious in this case). Just a thought.