Realizability for Finite State Reactive Agents
This work was produced during the Dovetail Research Fellowship. I’m grateful to Alex, Alfred, Winter, and Santiago, and to everyone who provided feedback, and to the fellowship community for their discussions.
Note on formatting: There was a rendering issue with display-style LaTeX in the original version of this post, where block equations were not visible after publishing. To avoid confusion, I have replaced those equations with images. Apologies for the inconvenience, and please feel free to comment if anything is unclear.
Abstract
We study AI agents through the lens of finite-state transducers and string-to-string behaviors. Instead of treating automata purely as language recognizers, we model agents as systems that map observation histories to action histories while interacting with an environment online. We introduce Finite-State Reactive Agents (FSRAs) as deterministic, finite-memory, reactive policies, establishing their equivalence to standard Mealy and Moore machines. Our main result (Theorem 3.13) characterizes exactly which behaviors can be implemented with finite internal state: a deterministic behavior is FSRA-implementable if and only if it is length-preserving, prefix-preserving (causal), and has finite memory, & relate FSRAs to standard Mealy and Moore machines and identify when richer computational models are unavoidable. We also introduce specifications (constraints on allowed agent behaviors) and formalize notions of weak enforceability, strong enforceability, and realizability. Using our characterization theorem, we prove strict separations between these notions (Theorem 3.15), demonstrating when finite-state agents can or cannot satisfy given specifications.
Introduction
Deterministic Finite Automata (DFAs) are usually introduced as a basic model of computation: simple machines that read an input symbol by symbol and decide whether to accept or reject a string. In theoretical computer science, they are often treated as a warm-up before more powerful models such as Turing machines.
In this work, we take a different perspective. A DFA can also be viewed as a minimal agent: it maintains a small internal state, receives observations from the environment, updates its state deterministically, and eventually produces an outcome. Despite their simplicity, DFAs already capture the core interaction loop shared by many AI systems: observe → update internal memory → act or evaluate.
This observation motivates a shift from language recognition to agent behavior. We introduce the model of a Finite-State Reactive Agent (FSRA), which generalizes DFAs from passive evaluators to active, online systems that map streams of observations to streams of actions using finite internal memory. FSRAs formalize deterministic, reactive agents whose behavior is causal and prefix-preserving, and whose internal state evolves according to a fixed update rule (time-invariance). They let us ask precise questions about memory, state abstraction, observability, and goal satisfaction without the complications of learning (updating the policy from data), optimization (searching over policies or parameters to maximize an objective), or stochasticity (randomness in the agent’s actions or environment). These characterizations let us state simple, checkable conditions for when a specification is enforceable or realizable by a set of finite-state controllers.
Outline. The rest of this post is organized as follows. Section 1 introduces the Finite-State Reactive Agent (FSRA) model and states its equivalence with standard Mealy and Moore machines. Section 2 develops formal properties of agent behaviors: determinism, length-preservation, causality (prefix-preservation), finite memory, reactivity, and time-invariance, modeled as properties of string-to-string functions. It concludes by introducing specifications (constraints on allowed behaviors) and three enforceability notions: weak, strong, and realizability. Section 3 presents our main results: Theorem 3.13 characterizes exactly which behaviors are FSRA-implementable (reactive + finite-memory), Corollary 3.14 shows that such behaviors induce realizable specifications, and Theorem 3.15 proves strict separations between the enforceability notions, demonstrating fundamental gaps between what finite-state agents can weakly satisfy versus fully realize. We conclude with worked examples including binary remainder computation (mod k), substring avoidance (no “11”), and count-with-ending constraints. Standard automata-theoretic background (DFAs, Mealy machines, Moore machines) and deferred proofs are collected in the Appendix.
Finite State Reactive Agent
Mealy machines (Definition A.2 in the Appendix) are deterministic finite-state transducers that produce an output symbol synchronously with each input symbol. This makes them a natural formal model for very simple online agents: the input alphabet corresponds to the agent’s observations, the finite state encodes the agent’s memory, and the per-step output is the agent’s immediate action. To make this perspective explicit and to use language familiar to AI and alignment readers, we adopt the name Finite-State Reactive Agent (FSRA) for a Mealy machine interpreted as a deterministic, finite-memory, online policy. The formal Definition 1.1 below gives the FSRA tuple and the precise causal / prefix-preserving behavior we require.
Definition 1.1 (Finite State Reactive Agent—FSRA)
A Finite-State Reactive Agent (FSRA) is a deterministic, finite-memory, online transducer that maps each input (observation) prefix to an output (action) prefix of the same length i.e. it emits exactly one action symbol for every observation symbol it consumes.
Formally, an FSRA is a tuple
is a finite set of states (the agent’s internal memory), is a finite observation alphabet (input), is a finite action alphabet (output), is the (total) state-update function, is the output (action) function, producing exactly one action per observation, is the start state.
The FSRA implements a function
that is length-preserving:
and causal / prefix-preserving: the
Concretely, on input
Remark (generalization). If you want to allow a variable number of action symbols per observation, simply let
Calling this model an FSRA makes the intended interpretation explicit:
This model is useful for reasoning about what can be implemented with finite memory, and for making clear where finite-state agents fail. When one wants to model policies with longer or variable delay between observation and action, or policies that maintain richer beliefs, the FSRA can be generalized (e.g.
Proposition 1.2 (Equivalence with Mealy / Moore)
Let
Every FSRA is, by definition, a standard Mealy machine (with input alphabet
, output alphabet , and single-symbol output per transition—Definition A.2′ in the Appendix). Conversely, every standard Mealy machine (whose output function produces exactly one symbol per input) is an FSRA. This equivalence is immediate from the definitions.For every FSRA
there exists an equivalent standard Moore machine (Definition A.3′ in the Appendix) computing the same length-preserving transduction , and conversely, every standard Moore machine can be converted into an equivalent FSRA.
Brief justification. Part 1 is just a renaming of components. For Part 2, the FSRA-to-Moore direction requires splitting each state
Formal Properties of Agent Behaviors
In this section, we define several properties of general observation-to-action behaviors, modeled as string-to-string functions. We then identify which of these properties are satisfied by Finite-State Reactive Agents (FSRAs).
Let
where
Later, we will specialize to deterministic behaviors, which are functions of the form
Example (Behavior)
Let
In this example, the behavior is nondeterministic: after some observation histories, multiple action histories are permitted.
Intuitively, a function
It is useful to separate behaviors from agents. A behavior
We also distinguish both from a specification: a specification P is a constraint on what an agent is allowed to do (formally, a relation
Allowing
Definition 2.3 (Deterministic)
A function
In this case, we may write
Example (Deterministic)
Let
So, the function returns
Determinism is the simplest way to turn a permissive specification into an actual policy: once you pick a deterministic bad?”) but remember determinism is a modeling choice, real systems are sometimes nondeterministic (stochastic) by design.
Definition 2.4 (Controller and Controller Set)
A controller is any concrete system (a finite automaton, a program, a circuit, a neural network, etc.) that determines a behavior
where
A controller set
If each
Example (Controllers)
Let
-
-
Then
depending on which controller is selected.
Definition 2.5 (Specification)
Let
be a specification (a relation between observation histories and allowed action histories). For an observation history
A specification is not the same as a behavior. A behavior describes what a controller does; a specification describes what a controller is allowed to do. In particular,
Let
so that
Observe that
Think of the controller as the machine you could put on the robot: it has internal structure and a rule for mapping sensory input to motor outputs. The behavior
Definition 2.6 (Length Preserving)
A function
For deterministic
Example (Length-preserving)
Let
i.e.
Intuitively, a length-preserving function produces exactly as many output symbols as it consumes input symbols. This models an agent that emits one action per observation (or more generally, a fixed number of actions per observation). It’s an explicit modeling choice: many controllers (especially embedded or real-time ones) naturally fit this model.
Definition 2.7 (Prefix Preserving/Causal)
A function
Causality captures the notion of online computation: the output produced after reading a prefix
Example (Prefix-preserving / causal)
Let
on seeing
Formally,
Then
Definition 2.8 (Finite Memory)
A deterministic function
has finite index (i.e., finitely many equivalence classes).
The relation
The finite-memory condition is the formal version of “bounded internal state.” It says there are only finitely many distinct ways the future can unfold after different pasts, so keeping one of finitely many summaries is enough. This is exactly why finite automata are a natural model for resource-limited agents.
Example (Finite Memory—Periodic every )
i.e. the symbol emitted at step
Claim.
Proof. Define relation (R) on prefixes by
There are at most
Example (Not Finite Memory—Perfect Square Count Predicate)
Let
at each step emit
Formally, if after reading prefix
Claim.
Proof for this claim can be found in appendix
Definition 2.9 (Reactive)
A behavior
prefix-preserving, i.e.
Equivalent characterization for deterministic behavior: A behavior
Reactivity does not imply memory-lessness or finite memory. A reactive behavior may depend on the entire observation history
Reactivity formalizes the intuitive idea that an agent must act in real time. Once an action is taken, it cannot be retracted when new information arrives. This matters for enforceability: a specification may allow some full action sequence in hindsight, yet no reactive agent can realize it because early commitments inevitably block all allowed continuations. Prefix-wise enforceability makes this tension explicit by asking whether an agent can always keep some allowed future open while acting online. We will see these definitions more formally later.
Example (Reactive)
Same as the prefix-preserving example above:
Example (Length-preserving but non-reactive behavior)
Let
where the output string has the same length as
This behavior emits the correct number of actions, but it does not act online. The first action symbol depends on whether the entire observation history has even or odd length, which cannot be determined when the first observation arrives. As a result, the agent’s early actions are not commitments it may retroactively “change its mind” once more observations arrive. This violates the core idea of reactivity: acting step by step in real time.
Example (Reactive but not finite memory)
Let
at each step output
Formally
This
This example shows reactivity alone doesn’t bound memory cost. The agent behaves online (each step emits one output) yet the rule depends on an unbounded statistic (counts), so no finite-state controller can implement it.
The properties above (length-preserving, prefix-preserving, finite memory) are exactly the ingredients of our main FSRA characterization theorem (Theorem 3.13). We will state and prove it in Section 3. Before that, we introduce the idea of enforceability and realizability, which we will use later when talking about which specifications such agents can satisfy.
Definition 2.10 (Time-invariance)
A family of online behaviors may be specified by a step-indexed family of per-step update/output rules
Time-invariance adds the further restriction that the decision rule doesn’t secretly depend on how many steps have elapsed.
Definition 2.11 (Enforceability and Realizability).
Weak enforceability: We say that specification
Another way to define would be like this
whenever the specification allows some action at history
Strong enforceability: We say that
equivalently
every action allowed by the specification at
Realizability: We say that
i.e. for every
Deterministic simplifications. When
For Weak enforceability:
equivalently
For strong enforceability:
equivalently
Special case: singleton controller. When
Weak Enforceability reduces to
and in the deterministic case this is whenever .Strong Enforceability reduces to
(i.e. every allowed action must be produced by the single controller), which in the deterministic case means is either empty or equals .Realization reduces to
Enforceability compares an abstract specification
Example (Weak but not Strong)
Let
(For every other observation string
Let the controller set be the singleton
Then:
For strong enforceability we require
with . But and no produces ; hence is not strongly enforceable by .Realizability asks
. Here ; thus is not realizable by .
Example (Strongly enforceable but not realizable)
Let
Let the controller set be
Then
Strong enforceability: for each allowed action in
there exists a controller producing it: Thus is strongly enforceable byRealizability:
. Hence and is not realizable by .
Proposition 2.12 (Relation between different notions of enforceability):
Let
Prefix-wise Enforceable: Say
Equivalently: for every observed prefix
Prefix-wise enforceability is an explicitly online notion: it requires that a single controller
Characterization of FSRA and Examples
Theorem 3.13 (Characterization of FSRA)
Let
-
is length-preserving: . -
is prefix-preserving/causal: . -
is finite memory: has finitely many equivalence classes.
(Equivalently:
Proof (Theorem 3.13):
Length-preserving. By the FSRA definition the machine emits exactly one action symbol for each input symbol. Therefore for every
the produced action string has the same length as . HencePrefix-preserving. The FSRA produces each output symbol at the time its corresponding input symbol is read; in particular the output produced after a prefix
is the prefix of the output produced after any extension . Formally, if the state sequence after reading is and the outputs are , these are initial symbols of the outputs produced on any longer input . Thus is prefix-preserving.Finite memory. Let
be the finite state set of . Define a map by . If then for every continuation the machine, starting from the same state, will produce identical future outputs on and . Hence . Thus, the number of -equivalence classes is at most , so has finite index.
This completes the (⇒) direction. Note also that the FSRA gives explicit fixed maps
1. State set. Let
2. Define
3. Define
Define:
4. The machine. Put
5. Correctness
Base
: machine produces and (length-preserving and causal behavior give that conventionally).Inductive step: assume for
that the machine output equals and that the machine reached state . On reading next symbol the machine outputs , which we defined to be the unique symbol with . Therefore, the machine’s output after equals . The next state is =[uo]). This completes the induction.
Hence
Thus (1)–(3) imply existence of an FSRA implementing
We call this model a Finite-State Reactive Agent because the name describes exactly what the math enforces. It is finite state because the agent’s memory is a fixed, finite set of states (the
Corollary 3.14 (Realizability of deterministic reactive finite-memory specifications by FSRAs)
Suppose
Then there exists a Finite-State Reactive Agent (FSRA)
In particular,
Theorem 3.15 (Strict Separation of Realizability and Enforceability for FSRAs)
There exist specifications
is strongly enforceable by the set of all FSRAs, but is not realizable by any set of FSRAs (finite or infinite). is weakly enforceable by a finite set of FSRAs, but is not strongly enforceable by any finite set of FSRAs.
Proof.
We treat the two constructions separately.
Part 1: Construction of
Let
Thus, for every observed prefix
(i) Strong enforceability by the set of all FSRAs.
For every
(ii) Non-realizability by any set of FSRAs.
Suppose
But the function
Part 2 - Construction of
Again let
the set of length-
(i) Weak enforceability by a finite set.
Let
(ii) Failure of strong enforceability by any finite set.
Fix any finite FSRAs set
Example (FSRA—Remainder mod k of the binary stream)
Setup: Fix integer
Define deterministic reactive
A small working example: (
Output
Finite Memory: Consider the relation
Claim:
If
then for any continuation the recurrence depends only on and , so . Hence .Conversely, if
, take to be a single symbol which distinguishes the remainders by the recurrence. So , hence .
Therefore, the equivalence classes are exactly the
FSRA: Define
(the residue classes). .For state
and observation :
After reading a prefix whose current remainder is
For example:
start from
.read 1:
emitread 0:
, emit .read 1:
, emit .
Output:
Specification:
where
For every observation prefix
So
Example (DFA and FSRA—Absence of a substring)
Let
States
where: = “no at the end / start state” (last symbol was or empty), = “last symbol was and no forbidden has occurred so far”, = dead/trap state (we have seen substring ).
Start state:
.Accepting states:
(we accept as long as has not occurred).Transition function
:
Setup: Observations
So,
Finite Memory: The relation
Define three candidate classes by partitioning all prefixes
(include here). (the dead/trap class).
These three sets are disjoint, and their union is all prefixes.
Note: We are using
Every prefix belongs to one of the three classes
FSRA: Construct FSRA
as above. is the start state. (same as DFA): (emit 1 iff the new state after reading the input is accepting):
For example, input
Start in
.Read 1: new state
, emit because 1 alone is in . (Output so far: 1.)Read 0: new state
, emit because Output: 11.Read 1: new state
, emit because . (Output: 111.)
If we had input 11 then at second step:
Start
. read 1 → , emit 1.Read 1 →
, emit 0. So, outputs 10 indicating the prefix of length 2 is rejected.
Specification: The realized specification is the singleton-valued relation
For each observation prefix
Example (DFA and FSRA—Constraint on Count and Ending)
Fix
(By convention the empty string
the residue
equal to the current count of modulo , andthe last seen symbol
, where denotes “empty / no last symbol yet” (useful for the initial state).
Define
. So .Start state
(zero occurrences seen, no last symbol).Transition
given by𝟙 is 1 if𝟙 and 0 otherwise.)Accepting set
An example,
Setup: We convert the language into a Boolean behavior, which is a convenient FSRA target. Let Observations:
Define
So
Finite Memory: The classes of
FSRA: We now give the Finite-State Reactive Agent
(same as DFA states). (per-step indicator).Start
.Transition
(state update) is the same as the DFA:𝟙 Output function
𝟙
Take
Start
Read : new residue , last symbol b. New state . Since but last symbol , ( ). Output so far 0.Read
: stays 1, last → state . Not . Output 0. Outputs 00.Read
: , last → → output 0. Outputs 000.Read
: (r=2), last → → output 0. Outputs 0000.Read
: (r=2), last → . Since output 0. Final outputs: 00000.
Now consider appending four more
After appending
the residue increases by 4 → new residue .After final
the residue remains , last symbol d, and outputs 1.
Conclusion
Thinking of agents as simple finite-state machines helps clarify what they can and cannot do. It shows which interactive behaviors can be handled using only limited memory, and when more powerful models are truly necessary. This perspective makes the design space clearer: some goals can be achieved with small, structured controllers, while others require fundamentally richer mechanisms.
There are several natural directions to explore. We can study what changes if the agent is allowed to use randomness, make delayed actions, or update its behavior over time through learning. Each of these makes the model more realistic, but also raises new theoretical questions. Important technical problems include: (i) understanding how hard it is to decide whether a specification can be enforced with limited memory, (ii) finding the smallest possible controller that satisfies a given specification, etc.
These directions connect verification, synthesis, and learning, promising a practical theory for building simple, verifiable reactive agents.
Appendix
Deterministic Finite Automata and Transducers
Definition A.1 (Deterministic Finite Automaton—DFA)
A DFA is a tuple
-
is a finite set of states -
is a finite input alphabet -
is the transition function (assumed total i.e. its defined for all possible tuples ) -
is the start state -
is the set of accepting states
A DFA defines a language
Here is a simple example to understand it. We will construct a DFA
Formally, define the automaton
States:
where the state encodes whether the agent has seen an even or odd number of 1s so far.Alphabet:
Start state:
(before reading anything, the count of 1s is zero, which is even)Accepting states:
Transition function
:Reading 0 does nothing to the parity:
-
Reading 1 flips the parity:
Given any input string
For example:
Input 1010 → two 1s → ends in
→ acceptedInput 111 → three 1s → ends in
→ rejected
You can view this DFA as a minimal agent with memory:
The state is the agent’s internal memory (here just one bit: even vs odd).
The alphabet is the agent’s observation space.
The transition function is the agent’s update rule.
The accepting states encode a goal or success condition. It already captures the core agent loop: observe → update internal state → decide success/failure.
Definition A.2 (Mealy Machine—A Deterministic Finite Transducer)
A Mealy machine is a tuple
is a finite set of states is a finite input alphabet is a finite output alphabet is the (total) transition function is the output function is the start state
On input
In a Mealy machine, each transition produces an output.
The output generated at each step depends on both the current state and the current input symbol and is produced synchronously with the consumption of that input symbol.
Thus, a Mealy machine defines a function
Equivalently, the transition and output functions can be combined into a single function
This formulation emphasizes that output is produced on transitions, which is why Mealy machines are also called deterministic finite-state transducers.
Now, let’s discuss a small example. We construct a Mealy machine that, on each input symbol:
outputs 0 when it reads 1
outputs 11 when it reads 0
Thus, the machine maps any binary input string to a binary output string in which every 1 appears in pairs, so the total number of 1s in the output is always even.
Define
(a single state, the machine is memoryless) (input alphabet) (output alphabet) (stay in on every input) (output 0 on input 0, output two 1’s on input 1)
Equivalently you can combine
Here is an example run:
Input:
.State sequence:
(always )Output (stepwise):
Concatenate to get . Count of 1’s in = = even.
This Mealy machine produces strings in the language
Every string in
If your goal is all even-1 strings, you need a different transduction strategy (for example allowing outputs ε or single 1’s at some steps plus internal bookkeeping so the machine can place 1’s arbitrarily while keeping total parity even). That requires either a larger state space or different output rules.
The Mealy machines defined above allow a single observation to trigger a variable-length burst of output. While this generality is convenient for describing transductions, it is too permissive for modeling agents that must performs a single action at each step in real time.
In an online agent setting, we might want a stricter notion: at each timestep, the agent observes exactly one symbol and immediately emits exactly one action. This synchrony ensures that the agent’s behavior is length-preserving.
To capture this reactive, per-timestep interaction pattern, we now introduce the standard Mealy machine, obtained by restricting the output function to produce a single symbol per input symbol.
Definition A.2′ (Standard Mealy machine) A standard Mealy machine is a tuple
Think of this Mealy machine as a tiny reactive agent:
Observation space
is what the agent sees at each timestep.Action space
is what it can emit; here actions can be length-2 strings (so actions may produce multiple primitive outputs in one step).The internal state
is the agent’s memory (here trivial: a single state, so no memory).The pair
is the agent’s policy + state-update rule. On every observation the agent immediately emits output determined by .
In this example the agent’s policy enforces a global constraint (even parity of 1’s) by a local rule (“whenever you see a 1, emit 11”). This shows two useful points for thinking about agents:
Global invariants from local rules: A simple per-step policy can guarantee a nontrivial global property of the interaction history (here: even parity).
Expressivity vs constraints: The policy here is conservative it enforces parity but at the cost of reduced expressivity (not all acceptable-looking outputs are reachable).
Definition A.3 (Moore Machine)
A Moore machine is a deterministic finite-state transducer given by a tuple
On input
where
Mealy and Moore machines are equivalent in expressive power: every Mealy machine can be converted to an equivalent Moore machine with at most a linear increase in the number of states, and vice versa.
Recall the Mealy machine from the previous example, which on input 0 outputs 0 and on input 1 outputs 11, thereby ensuring that the total number of 1’s in the output is always even.
We now construct a Moore machine that produces exactly the same input–output behavior.
Formally, define
States:
Input alphabet:
Output alphabet:
Initial state:
Transition function (
):Output function (
):
Thus:
whenever the machine enters state
, it outputs 0,whenever it enters state
, it outputs 11.
For example, on input
and the output is
As in the Mealy case, each input symbol 1 contributes exactly two 1’s to the output, and each 0 contributes none. Hence every output string has an even number of 1’s.
This Moore machine implements the same transduction as the Mealy machine in the previous example, but the output is associated with states rather than transitions. The extra state
Moore machines label states with outputs rather than labeling transitions. The most general Moore transducer can emit a short burst of output when a state is entered, but that extra generality is unnecessary when modeling agents that act once per timestep. To model a reactive, step-by-step agent it’s clearer to assume each entered state produces exactly one action symbol. This synchronous restriction
Definition A.3′ (Standard Moore machine) A standard Moore machine is a tuple
From an agent viewpoint, Moore machines correspond to agents whose actions depend only on their current internal state, not directly on the most recent observation. The observation updates the state via
Remaining Proofs
Proof for non-finite memory of
For each integer
WLOG
(Existence: for fixed nonzero
Set
, a perfect square, so the output bit at the final step of (and indeed one of the outputs in the run) is . , which by choice of is not a square, so the corresponding output bit in that position is .
Thus
Proof for finite memory of :
If two prefixes lie in the same class, then they are
Case 1 (
Case 2 (
Case 3 (
So, each class is contained in a single
Distinguish
Thus
Distinguish
Distinguish
Hence the three candidate classes are pairwise distinguishable, so they correspond to three distinct
If you look at finite state agent, don’t you also have to look at the interaction of it with some (diverse) environment? And that’s basically how TMs work? So, what’s up with that, what makes them finite state in relevant sense.
Thanks for your question. I may be misunderstanding what you’re pointing at, so apologies in advance if I’m replying slightly off target.
In my setup, interaction with the environment is entirely captured by the observation sequence. The agent does not get access to any external writable memory or oracle beyond the current observation symbol and its own finite internal state.
Crucially, the only memory it has is its finite state set Q. The environment may be arbitrarily complex, but all that complexity reaches the agent only through the finite observation alphabet O, symbol by symbol. There is no tape, no unbounded working memory, and no way to encode arbitrary computation internally.
That’s the key difference from a Turing machine. A TM interacting with an environment can still have access to an unbounded tape, so its computational power comes from its memory model, not from whether it is “interactive.” In contrast, an FSRA has bounded internal memory by construction (finite Q), so it can only implement finite-memory transductions (equivalently, functions with finite Myhill–Nerode index, as in Theorem 3.13).
I didn’t mean to say “internal” memory. E.g. you can start TM head, which is finite automaton, on empty tape. So, how is it different from starting finite automaton in the world where you can interact with something that might as well be tape, and get TM analogue in combination of agent + world.
So, it has only observations from environment, no action lever to pull? Or the actions are internal, without being relayed to environment?
(I did not read your whole post sorry)
Two quick clarifications that I should state here more explicitly:
- An FSRA bounds the agent’s internal memory: it has a finite state set Q and fixed maps δ,λ. The agent emits an action symbol for each observation symbol; actions are part of the model (alphabet A).
- What I did not assume (unless explicitly stated) is any particular model of the environment’s memory or persistence. If the environment can record the agent’s actions and later reveal them back as observations (i.e. the environment is effectively a persistent writable tape), then an FSRA interacting with that environment can indeed implement behavior that a standalone finite automaton cannot, the joint agent+environment can be (arbitrarily) powerful.
The point of the FSRA model in this post is to isolate finite internal memory as a resource. To decide whether the closed-loop system is finite-state you must also model the environment. Observations do not depend on past agent actions (or the environment does not persist agent writes). Then the agent’s only memory is Q, and Theorem 3.13 applies.