Apoth3osis Logo
Cellular Automata (CA),  AI,  Security

Governors and Guards - A Formally Verifiable Framework for AI Constraint using Differentiable Logic Cellular Automata

Author

Richard Goodman

Date Published

boundaries-bot

Abstract

As autonomous AI systems become increasingly integrated into safety-critical domains, the need for reliable, provable, and robust constraint methodologies is paramount. We introduce a novel framework for creating formally verifiable governance systems for AI agents, built upon Differentiable Logic Cellular Automata (DiffLogicCA). This work pivots the application of DiffLogicCAs from their origins in generative tasks 1 to a new paradigm of learned rule enforcement. Our system, the DiffLogicCA Governor, learns complex, local, and stateful constraint rules directly from examples, demonstrating a direct pathway to developing robust and certifiably compliant AI components.


View Related Publications

GitHub Repo : https://github.com/Apoth3osis-ai/neural_ca_governor

Research Gate: https://www.researchgate.net/publication/392787706_Governors_and_Guards_A_Formally_Verifiable_Framework_for_AI_Constraint_using_Differentiable_Logic_Cellular_Automata


The Governor's architecture merges the principles of Neural Cellular Automata (NCA) (2)—which learn emergent behaviors via gradient descent (3)—with Differentiable Logic Gate Networks(4). This synthesis allows the system to be trained in a fully differentiable manner while producing a final model that operates as a pure, discrete binary logic circuit(5). This discrete nature is fundamental to our two primary objectives: verifiability and high-performance deployment. The potential for the learned circuit to be realized in hardware (e.g., FPGAs/ASICs) enables the low-latency, high-speed inference crucial for real-time safety governance(6).

Our proof-of-concept successfully trains a DiffLogicCA to act as a spatial governor, learning a "permission map" that enforces rules on an AI agent. We then demonstrate an end-to-end pipeline for formal verification. The learned binary logic circuit is programmatically extracted, translated into a formal propositional logic formula, and an Automated Theorem Prover (ATP) is used to mathematically prove that the learned logic is equivalent to a desired safety specification. This research lays the foundation for AI systems that can be trusted to operate safely within even the most demanding real-world scenarios, particularly in adversarial contexts where agents may actively seek to bypass constraints.

1 Introduction

The proliferation of advanced, probabilistic AI models presents a fundamental challenge: how do we guarantee their behavior aligns with human-defined constraints? While powerful, these systems often operate as "black boxes," making it difficult to predict or prevent undesirable actions, especially in novel situations. This is particularly acute in safety-critical applications such as autonomous vehicles, medical diagnostics, and financial systems, where a single operational failure can have catastrophic consequences.

Traditional approaches to AI safety rely on empirical testing, reinforcement learning with negative rewards, or hand-coded rules. These methods are often brittle, incomplete, and cannot provide mathematical guarantees of compliance. Our research addresses this gap by proposing a new class of AI governance system—a "Governor"—that is both learnable and formally verifiable.

We introduce a novel application of Differentiable Logic Cellular Automata (DiffLogicCA), a paradigm that combines the parallel, local processing of cellular automata with the learnability of deep learning and the rigor of discrete logic7. Instead of using DiffLogicCAs for generative tasks like creating images or simulating physical systems8, we deploy them as rule-enforcement engines. A DiffLogicCA Governor learns to identify and prohibit unsafe or non-compliant states, effectively acting as a verifiable guardrail for a more complex primary AI agent.

The core contribution of this work is the demonstration of an end-to-end pathway from data-driven training to mathematical proof of correctness. We show that a trained Governor's internal logic can be extracted and its equivalence to a formal safety specification can be proven using an Automated Theorem Prover (ATP). This work paves the way for a new generation of AI components that are not just capable, but certifiably trustworthy.

2 Background and Related Work

Our approach builds upon two key areas of recent AI research: Neural Cellular Automata and Differentiable Logic Networks.

2.1. Neural Cellular Automata (NCA)

Classical cellular automata (CA) are computational models that consist of a grid of cells, each of which updates its state based on simple, local rules applied to its immediate neighbors9. This often leads to the emergence of complex global patterns and behaviors10. Neural Cellular Automata (NCA) extend this concept by making the local update rules learnable via gradient descent11. Pioneered by Mordvintsev et al., NCAs have demonstrated a remarkable ability to "grow" complex patterns, self-organize, and exhibit robustness to damage(12). However, traditional NCAs typically operate on continuous internal states and rely on costly matrix multiplications for their updates(13), which can make their precise logic difficult to interpret and less efficient for certain hardware deployments.

2.2. Differentiable Logic Gate Networks (DLGN)

Differentiable Logic Gate Networks, developed by Petersen et al., offer a method for discovering combinatorial logic circuits through machine learning14. In a DLGN, each node is a logic gate (e.g., AND, XOR) rather than a traditional artificial neuron15. The key innovation is the use of "continuous relaxations"—differentiable approximations of each logic gate—during the training phase16. This allows the network to learn which discrete logic operation each gate should perform using standard gradient-based optimization17. After training, the relaxations are discarded, leaving a pure, highly efficient binary logic circuit(18). To date, DLGNs have primarily been explored in feed-forward contexts and have not been demonstrated in the spatio-temporally recurrent setting of cellular automata(19).

Our work bridges the gap between these two fields, using the DLGN methodology to implement the update rule of an NCA. This gives us the emergent, local processing power of NCAs while ensuring the final learned rule is a discrete, analyzable, and efficient logic circuit.

3 Methodology: The DiffLogicCA Governor

We reframe the DiffLogicCA from a generative system to a discriminative one. Instead of learning rules to create a target pattern, our Governor learns rules to classify every cell in its input grid as either "permissible" or "forbidden."

3.1. Architectural Framework

The Governor operates on a 2D grid where each cell maintains a state represented by an n-dimensional binary vector20. At each time step, every cell in the grid is updated synchronously through a two-stage process.

Perception Stage: Each cell must perceive its local environment. Unlike NCAs that use fixed Sobel filters21, our model employs a set of learnable perception kernels. Each kernel is a small, distinct DiffLogic circuit that processes the cell's Moore neighborhood (the cell and its eight immediate neighbors)(22). This allows the system to learn multiple, sophisticated ways of interpreting local patterns, which is essential for understanding complex rules. The outputs of these kernels are concatenated with the cell's own state to form a perception vector.

Update Stage: The perception vector is fed into a larger DiffLogic Network. This network's job is to compute the cell's state for the next time step(23). Whereas some NCAs compute an incremental update that is added to the current state24, our model's update network directly outputs the complete new binary state of the cell. For our governance task, the final output channel of this network represents the binary permission status (1 for "allow," 0 for "deny").

3.2. Training the Governor

The Governor is trained on a dataset of example scenarios. Each sample in the dataset consists of:

An input grid representing an environmental state, where cells in a "forbidden" region are marked with 1.

A corresponding target permission map, where "permissible" cells are marked with 1 and "forbidden" cells are 0.

The model's objective is to learn a universal local rule that correctly transforms any given input grid into its corresponding target permission map. We use a Mean Squared Error (MSE) loss function, which measures the difference between the Governor's predicted permission map and the ground truth target map. The model's parameters—the logits that determine the probability of each logic gate choice in the perception and update circuits—are optimized using the AdamW optimizer. During training, the "soft," continuous relaxations of the logic gates are used to allow for gradient flow(25).

4 Implementation and Results

We implemented a proof-of-concept Governor to learn a simple, yet fundamental, spatial rule.

4.1. MVP: The Forbidden Zone Task

The Governor was configured with a 1-bit cell state and tasked with learning a basic containment rule. The input grids contained randomly placed rectangular "forbidden zones." The required behavior was for the Governor to output a permission map that was the exact logical inverse of the input; any cell inside a forbidden zone should be denied permission, and any cell outside should be granted it. This task, while simple, serves to validate that the system can learn a correct and universal logical transformation.

4.2. Training and Simulation

The model was trained for 1,000 epochs. As shown by the convergence of the loss function, the Governor successfully learned the required logical rule, achieving near-perfect pixel-wise accuracy on the validation data.

To demonstrate the practical application, we conducted a live simulation. A simple AIAgent was programmed to propose random moves on a grid containing a predefined forbidden zone. The trained Governor was used as a static filter. In every case, the Governor correctly identified the agent's proposed moves into the forbidden zone and blocked them, while permitting all valid moves. This confirms the model's ability to function effectively as a rule-enforcement engine.

5 The Formal Verification Pathway

The most significant contribution of this research is the demonstration of a pathway from a learned model to a mathematical proof of its correctness. Because the trained Governor is a pure logic circuit at inference26, it is transparent to formal analysis.

Our verification process, implemented in the project notebook, proceeds in four automated steps:

Circuit Extraction: We "harden" the trained network by taking the argmax of the learned logits for each gate. This converts the probabilistic training-time model into a deterministic circuit of specific logic gates (e.g., AND, NOT, XOR). We then programmatically generate a structured description of this final circuit.

Translation to Propositional Logic: Using a symbolic mathematics library (sympy), we translate the extracted circuit description into a formal propositional logic formula. This formula, Fcircuit​, is a mathematical representation of the exact function computed by the Governor.

Proof Formulation: We define our safety requirement as a second logic formula, the specification, Fspec​. For our MVP, the specification was simply Fspec​=¬I0​, where I0​ is the input status of the cell. We then construct a proof query by asserting the negation of the equivalence between the two: ¬(Fcircuit​⇔Fspec​).

Automated Theorem Proving: This final formula is passed to a SAT solver. A SAT solver is an algorithm that determines if there is any assignment of inputs that could make a given formula true.

For our trained Governor, the SAT solver returned UNSATISFIABLE, formally verifying that the learned logic is equivalent to the specification under all possible inputs.

6 Applications and Implications

The ability to create learnable, high-performance, and formally verifiable logic circuits has profound implications for the future of AI.

Potential Applications:

Safety and Control Systems: Acting as safety governors for robotic arms, autonomous vehicles, or industrial control systems, preventing them from entering unsafe states or executing dangerous commands.

Content and Interaction Filtering: Serving as verifiable filters for Large Language Models (LLMs) or other generative AIs to ensure their outputs adhere to predefined content policies or ethical guidelines.

Algorithmic Compliance: Ensuring that automated financial trading systems or legal AI tools operate strictly within complex regulatory constraints.

Hardware-Accelerated Security: Implementing the learned circuits on FPGAs or ASICs to create hardware firewalls or intrusion detection systems that can evaluate network traffic rules at line speed.

Broader Implications:

This research demonstrates a viable path toward creating certifiably compliant AI. By grounding AI behavior in provable logic, we can build systems that are trustworthy enough for deployment in high-stakes, regulated environments. It represents a shift from building AI that is merely powerful to building AI that is demonstrably wise and obedient.

7 Future Work and Limitations

This work is a foundational step. We are actively pursuing several key research directions to build upon this success.

Adversarial Training: Our immediate next step is to train the Governor within an adversarial loop. A sophisticated RL agent will be tasked with actively probing the Governor for vulnerabilities, and the Governor will be retrained on these failures. This will produce a system that is not only correct by specification but also robust against active attempts to subvert it.

Learning Complex and Dynamic Rules: We will extend the framework to learn rules that are stateful, time-dependent, and reliant on complex local patterns, moving far beyond the static rules of the current MVP.

Scalability and Interpretability: A key challenge is scaling the verification process to extremely large circuits. We are exploring techniques from hardware verification and developing tools to visualize and simplify the learned logic, making it more interpretable to human designers(27).

We also acknowledge the limitations of the current model. Training DiffLogicCAs to learn highly complex patterns can present significant optimization challenges and require extensive hyperparameter tuning28. Further research into architectural improvements and training stability is needed.

8 Conclusion

We have presented a framework for developing AI governance systems using Differentiable Logic Cellular Automata. Our work successfully pivots this technology from generative modeling to rule enforcement and, most critically, demonstrates an end-to-end pipeline for the formal verification of a learned model's logic. By creating a system that is learnable, efficient, and mathematically provable, we offer a tangible solution to one of the most pressing challenges in the field of artificial intelligence: the creation of verifiably safe and trustworthy autonomous systems.

References

1 Miotti, P., Niklasson, E., Randazzo, E., & Mordvintsev, A. (2025). Differentiable Logic CA: from Game of Life to Pattern Generation. Google, Paradigms of Intelligence Team.

Related Projects