Kolmogorov Theory · Agentic Learning · Compression

The Agentic Compressor

A physicist doesn't learn passively. She plans how to learn—choosing which hypothesis to test, which approximation to relax, which simulation to run. We argue that any genuinely powerful modeling system must do the same: treat model construction itself as an action space.

Authors  Giulio Ruffini with Claude Opus 4.6 Date  April 2026 Series  BCOM WP0068 · KT Pillar

Within the Kolmogorov Theory (KT) framework, the algorithmic agent is decomposed into three functional modules: a Modeling Engine (ME) that builds compressive world models, an Objective Function (OF) that assigns value to states, and a Planning Engine (PE) that selects actions by counterfactual simulation. Previous work has formalized what the ME must produce—compressed representations that are both short and informative—but has treated the process of building those models as a black box.

This paper opens that box. The core claim is simple and, we think, important: model construction is itself an action-selection problem.

A modeling system should not merely update itself from data. It should actively choose how to update itself in order to become a better compressor of data. The planning engine acts not only on the world but also—and perhaps primarily—on the modeling engine itself.

The gap

The KT agent definition requires the agent to maintain a model state m that compresses its data stream. But how does that model get built? In the formal definition, the model update function is simply given—part of the static wiring. The choice of which aspect of the model to revise, which hypothesis to test, which internal simulation to run, which approximation to relax—none of this is represented. The update rule is treated as fixed, not as something the agent plans over.

For simple agents—a thermostat, a bacterium—a fixed update rule is fine. But for any agent engaged in genuine model building—a scientist, a mathematician, a child learning physics—the process of updating the model is itself a complex, multi-step, goal-directed activity involving choices at every turn.

The same gap exists in machine learning. Standard deep learning runs: ingest data, compute loss, update weights, repeat. This is powerful, but weakly agentic with respect to model construction. There is no mechanism deciding which hypothesis should be probed, which submodel challenged, which internal simulation is worth running. The system learns, but it does not strategize about how to learn.

The internal action space

We propose that the ME should be equipped with an explicit internal action space: a set of computable operations on the current model state, each of which transforms it into a revised model state without emitting an external action. These are choices among alternatives, each with different consequences for model quality. They are not mere subroutines of a fixed algorithm—they are options that a planning process can evaluate and select among.

Internal Action What It Does
Attend(S) Select a subset of data or model components for focused processing.
Simulate(h) Run a forward simulation of a hypothesis to generate predicted data.
Compare(h₁, h₂) Evaluate two alternative latent organisations against the current data.
Reparameterize(φ) Change the coordinate system of the latent space.
Revise(μ) Update a specific submodel while freezing the rest.
Allocate(R, μ) Dedicate more computation to a difficult submodel.
Invoke(R) Call a symbolic or probabilistic subroutine (logic, algebra…).
Unify(μ₁, μ₂) Merge two submodels into a single more general model.

The objective for selecting among these actions is compression gain: each internal action is evaluated by how much it reduces the two-part description length (model length plus residual data length given the model). A positive compression gain means the action improved the model. The internal planner must also account for computational cost—simulating a complex hypothesis or running a symbolic derivation is not free—producing an internal analogue of the explore-versus-exploit tradeoff.

Two planning loops

The full agent now has two planning loops operating simultaneously: one that acts on the world, and one that acts on the model.

External PE

Acts on the world

Selects external actions by simulating outcomes through the world model. "What should I do next?" Maximises the objective function over predicted trajectories.

Internal PEint

Acts on the model

Selects internal model-building operations. "What should I do to my model next?" Maximises compression gain minus computational cost.

These loops interact: a better model (produced by PEint) enables better external planning (by PE), and external actions (chosen by PE) bring in new data that changes the landscape for PEint. The two loops are coupled but operate over different action spaces and different objectives.

Attention: the existence proof

The Transformer attention mechanism is, we argue, already a genuine—if primitive—instance of agentic model construction. During training, the attention pattern determines which parameters receive which gradient signals. If the model attends strongly to token j when processing token i, the parameters governing that coupling receive a strong gradient update, while unattended couplings are left relatively untouched. This is context-dependent, selective model revision: attention simultaneously implements Attend(S) and a form of Revise(μ).

The internal action selection is implicit—there is no separate meta-planner reasoning about which attention pattern would produce the best gradient signal. But the KT framework has always insisted that agency can be implicit in the physics of a system. A bimetallic strip doesn't "decide" to bend, yet it functionally implements a model, an objective, and an action selector. Attention during training similarly implements selective model revision without an explicit meta-level decision.

But attention is only one internal action, operating through only one mechanism (soft routing of gradients). It doesn't provide hypothesis generation, structural reparameterization, symbolic reasoning, compute allocation, or theory unification. The Agentic Compressor generalizes what attention already does: it elevates the full repertoire of model-building operations to first-class status, under a unified compression-gain objective and an internal planner that can navigate the combinatorial space of model-construction strategies.

A spectrum of internal agency

None

Vanilla MLP

Fixed-architecture gradient descent. Uniform parameter updates. The internal action space has exactly one element.

Primitive

Transformers

Attention implicitly routes gradient signal via learned, input-dependent patterns. Selective model revision, but no meta-planner.

Heuristic

AI Feynman / AI Physicist

Handcrafted repertoire of model-building operations (decompose, unify, reparameterize) under heuristic control.

Full

Agentic Compressor

A general internal planner navigating a rich space of model-building operations under a compression-gain objective. Not yet realised.

The physicist as paradigm case

The paradigmatic Agentic Compressor is a working physicist. She doesn't merely absorb batches of data and update parameters. She asks: What regularity is worth testing? What hypothesis would compress the data better? What part of my model is likely wrong? What mental simulation should I run? What approximation should be relaxed?

When she proposes "What if the potential is quadratic?" she is executing Simulate(h) + Compare. When she notes "This should be rotationally invariant," she is executing Reparameterize to exploit group structure. When she discovers that electricity and magnetism are aspects of the same thing, she is executing Unify(μ_E, μ_B). Crucially, the sequence is planned: which operation to perform next depends on the current model state, what has been tried before, and an implicit estimate of expected compression gain.

This connects to Schmidhuber's proposal that curiosity and creativity are driven by compression progress. Our formulation goes further: Schmidhuber identified the reward signal (compression progress) but not the action space (specific model-building operations) or the planning structure (the internal PE that navigates that space strategically). The Agentic Compressor provides the architectural substrate in which compression progress becomes a plannable objective.

LLMs as loaded world models

The paper's original framing described a bare LLM as "at best a weakly agentic system." We have since come to a sharper view: an LLM is best understood as a Turing machine with a loaded world model.

Through training on vast data, the LLM has already constructed a massively compressed model of the world—it is, in KT terms, a pre-built Modeling Engine. The parameters are the world model: a compressor that has internalised the regularities of its training distribution. This is not a metaphor. Any learned predictive distribution can be converted into a near-optimal lossless compressor via arithmetic coding, and empirical work on LLM-based compression (e.g., LLMZip) confirms this operationally.

What the LLM lacks, in isolation, is an Objective Function and a Planning Engine. But here is the key insight: you can supply both via the prompt. A system prompt that specifies goals, constraints, evaluation criteria, and planning instructions is, functionally, a harness that activates the OF and PE on top of the already-loaded ME. Tool schemas extend the action space. Orchestration logic (ReAct loops, agentic scaffolding) closes the perception–action loop. The LLM does not need to be "embedded in an external architecture" as if it were a passive component—it already contains the world model, and prompting is sufficient to activate the full ME/OF/PE triad.

Updated position

The LLM is not a blank slate that becomes an agent only when wrapped in external machinery. It is a Turing machine with a pre-loaded world model. The prompt serves as a harness—supplying the objective function and planning instructions that transform the loaded ME into a complete algorithmic agent. This is a more parsimonious account than treating the LLM as a passive component requiring an external architecture to achieve agency.

This reframing also sharpens the Agentic Compressor proposal. When an LLM engages in chain-of-thought reasoning, it is already performing a primitive form of internal planning over its model: running mental simulations (Simulate), evaluating candidate reasoning paths (Compare), attending selectively to relevant context (Attend). The harness can amplify this by explicitly instructing the system to decompose problems, test hypotheses, or check its own reasoning against constraints—effectively expanding the internal action space via prompting. The LLM-with-harness is therefore not merely an existence proof for the Agentic Compressor concept; it is a partial instantiation, limited mainly by the fact that its internal planning is implicit in autoregressive token prediction rather than governed by a dedicated internal planner with an explicit compression-gain objective.

Implications

For the KT program

The Agentic Compressor refines the KT agent architecture by giving the ME internal structure: its own action space, objective, and planning process. The ME becomes an "agent within the agent"—an internal algorithmic agent whose world is the space of possible models and whose actions are model-building operations. This connects naturally to the Good Algorithmic Regulator theorem, which shows that sustained regulation requires the regulator to contain a model of the world. The Agentic Compressor provides the mechanism by which that model is actively constructed.

For AI

If the gap between current deep learning and human-level model construction lies precisely in the absence of internal planning for model building, then scaling data and parameters alone will not close it. What is needed is an architectural innovation: elevating internal actions to first-class status within the learning loop. Current systems already instantiate fragments of this idea. The missing piece is a general-purpose internal planner that can navigate the full space of model-building operations.

For structured experience

Within KT, structured experience arises at the Comparator—the submodule that evaluates prediction error. The Agentic Compressor enriches this picture: if the ME is itself an agent performing internal planning, then the process of model construction is itself a form of experience. The "aha moment" of a scientist discovering a unifying principle is a large compression gain produced by a Unify action, registered at the Comparator as a dramatic reduction in prediction error. The richer the internal action space and the more strategic the internal planning, the richer the structure of the resulting experience.

The key insight, in one sentence: a modeling system should not merely update itself from data; it should actively choose how to update itself in order to become a better compressor of data. The planning engine is not merely for acting on the world. It is also—and perhaps primarily—for acting on the model.