Why Rust solves a Problem we no longer have
AI + Formal Proofs make safe Syntax obsolete
The software industry is currently obsessed with “memory safety by syntax”, rewriting vast legacy codebases in Rust to constrain human error. This strategy is an expensive anachronism. With the advent of advanced AI agents (like Claude Opus 4.5), we are entering an era where the primary artifact of engineering is no longer code, but intent. This article argues that the optimal safety stack for the AI era is not “AI writing Rust,” but “AI generating Formal Specifications” (such as Event-B/B-Method) that compile to C. By shifting trust from a compiler’s borrow checker to mathematical proof, we can achieve defect-free systems at a fraction of the cost, reviving the dream of “correctness by construction” that Jean-Raymond Abrial and Bertrand Meyer pioneered, but which was previously too cognitively expensive for humans to scale.
The Horseless Carriage of the AI Era
When the first automobiles appeared, they looked exactly like horse-drawn carriages, minus the horse. It took years for engineers to realize that without a horse, you didn’t need a high driver’s seat or a whip socket. The form was stuck in the past, even as the propulsion changed.
Today, we are witnessing the exact same phenomenon in software engineering. We have powerful AI agents capable of reasoning, generating, and refactoring logic at superhuman speed. And what are we asking them to do? We are asking them to write Rust.
We are asking silicon intelligence to perform a dance designed specifically to constrain human fallibility. The borrow checker, lifetimes, the agonizing fight with the compiler; these are seatbelts for a biological brain that forgets who owns a pointer. An AI does not forget. For an AI agent, maintaining the state of ten million pointers is no different than maintaining ten.
Optimizing AI to write “safe code” is like the “horseless carriage”. It works, but it misses the point of the technology entirely.
From assembly to Intent
Before high-level languages, much serious software was written directly in assembler. In that regime, every invariant, every calling convention, every aliasing assumption, every “this must never happen”, has to be maintained in the engineer’s head, across thousands (eventually millions) of instructions. It’s not that humans can’t do this. Historically, we proved that we can: early operating systems and foundational platforms were built close to the metal, and the results were often brilliant. The problem is scale. Past a certain project size, assembly doesn’t merely demand discipline; it demands something closer to “superhuman consistency”. The mental stack overflows: local changes stop being local, implicit assumptions metastasize, and the cost of understanding becomes the dominant cost of building.
High-level languages were invented to reduce cognitive load and make programming more “human compatible”: they compress intent, enable modular reasoning, and make maintenance feasible, even when systems outlive their original authors. In other words: high-level languages are not primarily about making the machine more capable. They are about making humans less fragile. Ada pushed that idea further. It wasn’t just a nicer syntax; it was a deliberate engineering response to predictable human failure modes: strong typing, explicit interfaces, and a culture of disciplined development aimed at preventing errors rooted in ambiguity and undocumented intent. Rust stands as the modern heir to this tradition, sharpening the trade-off. It asks the human to work harder to satisfy the compiler, in exchange for removing the unpredictable runtime behavior of manual memory management.
Now comes the uncomfortable question: if high-level languages were, in large part, prosthetics for human cognition, what happens when the “programmer” is not human?
Modern AI agents do not get tired, do not lose the thread after the thirtieth refactor, and do not experience cognitive overload in the way humans do. Asking them to spend their intelligence “writing nicer code” can become a category mistake: it optimizes the artifact for human convenience, even when humans are no longer the primary authors. The higher-value target is the thing humans still uniquely struggle with at scale: making intent explicit, complete, and checkable.
That suggests a different division of labor. Humans should focus on goals, constraints, and non-negotiable properties (”this must never happen”), because that is where domain responsibility lives. Machines should focus on what machines are good at: exhausting the logical consequences, exploring corner cases, and enforcing invariants with mechanical consistency. In that world, the most important output isn’t a “human-friendly program”; it’s a machine-checkable statement of intent: specifications, invariants, and proofs, out of which code can be generated as an implementation detail.
The “Unsafe” Reality Check
Languages like C or C++ allow memory corruption, buffer overflows, and race conditions that have plagued our industry for fifty years. That’s where Rust comes in. But Rust is not a magic wand. As any senior systems engineer knows, the moment you touch the hardware (drivers, kernels, embedded controllers) you hit “unsafe” blocks. You are back to managing invariants manually. The “safety” is a contract between the programmer and the compiler. If the programmer (or the AI) misunderstands the invariant inside an “unsafe” block, the safety guarantee evaporates.
We are trading one set of problems for another: instead of debugging segfaults in C, we are debugging complex lifetime annotations in Rust, hoping that our “unsafe” boundaries hold. It is a bureaucratic solution to a mathematical problem.
The Alternative: Mathematical Proof (the “French School”)
There is another way. It is not new, but it was effectively “lost” because it was too hard for humans.
In the early 2000s, I was completing my PhD at ETH Zürich. While my days were filled with NMR protein structure determination, my intellectual curiosity drew me to the lectures of Bertrand Meyer. Meyer, fresh at ETH, was preaching “Trusted Components” and Design by Contract. Through him, I encountered Jean-Raymond Abrial and the B-Method.
Abrial’s approach, and the “French School” of engineering (Ada, SPARK, B, Event-B), was radically different from the Silicon Valley “move fast and break things” culture. They didn’t care about “developer experience” or “vibe coding.” They cared about truth.
The B-Method doesn’t just “check” code. It proves it. You define the system state and the invariants (e.g. “Train A and Train B cannot be in the same sector”). You define events that modify the state. Then, you generate Proof Obligations (POs). If you can prove the POs, the system cannot violate its safety properties. It is mathematically impossible [1].
The Paris Metro Case
The most cited industrial deployment of the B-Method is the Paris Métro Line 14 (METEOR), the first fully automatic, driverless metro line in Paris. According to ClearSy (the tool vendor) and RATP (the operator), the safety-critical automatic train control software was specified using the B-Method. Over 110’000 lines of B models generated 86’000 lines of Ada code [2][3][4].
Industry reports state: ”No bugs were detected after the proof was completed, neither at the functional validation, at the integration validation, and at the on-site testing, nor since the beginning of the metro line”. The software was almost perfect the first time.
So why didn’t B take over the world? Because writing formal proofs is hard. It requires a level of mathematical rigor that 99% of developers (myself included) find exhausting. It was a correctness engine that required a PhD to operate.
Enter the AI: The Perfect Formal Methods Engineer
This is where the paradigm shift happens. AI agents do not get tired of mathematics. Claude Code (Opus 4.5) doesn’t mind writing 50 pages of predicate logic. It doesn’t get frustrated when the Rodin platform (the standard tool for Event-B) rejects a proof. It just tries again.
This unlocks a new workflow that makes Rust obsolete for high-assurance systems:
Intent: The human architect defines the goal in natural language.
Formalization: The AI translates this into an Event-B specification (Variables, Invariants, Events).
Verification: The AI feeds this into Rodin/ProB. The tool tries to prove the invariants.
Refinement Loop: If a proof fails, the tool gives a counter-example. The AI reads it, corrects the spec (e.g., adds a missing Guard), and retries.
Generation: Once fully proven, the system generates C code (or Ada).
Operationally this can be an agent loop: it runs Rodin, patches the model until the proof obligations go green, and only then generates C, while the human mostly reviews intent and the resulting invariants/refinements rather than debugging code. Because Event‑B and the Rodin platform are mature and heavily documented, an agent can realistically automate the loop, instead of the human manually driving the IDE.
We don’t read the C code. It’s a binary blob, an implementation detail. We trust the C code not because it was written by a safe language, but because it was generated from a proven model.
Case Study: The “Traffic Light” Test
Let’s look at a trivial example to see the difference.
The Intent: “A traffic light system for a 4-way intersection. North-South (NS) and East-West (EW) must never be green at the same time.”
The Rust Approach:
You write a TrafficLight struct. You use an enum for state. You might use a Mutex. The compiler ensures you don’t access the memory of a deleted light. But does it ensure NS and EW aren’t green simultaneously? No. That’s a logic error, not a memory error. Rust saves you from a segfault, but it happily compiles a crash.
The AI + Event-B Approach:
I asked a modern LLM (simulated here for brevity) to handle this. It didn’t write code. Instead, it wrote invariants, such as:
CONTEXT TrafficCtx
SETS
COLORS
CONSTANTS
Red Green Yellow
AXIOMS
axm1: partition(COLORS, {Red}, {Green}, {Yellow})
END
MACHINE TrafficLights
SEES
TrafficCtx
VARIABLES
light_NS light_EW
INVARIANTS
inv1: light_NS ∈ COLORS
inv2: light_EW ∈ COLORS
inv3: light_NS = Green ⇒ light_EW = Red
inv4: light_EW = Green ⇒ light_NS = Red
EVENTS
INITIALISATION
THEN
light_NS := Red ||
light_EW := Red
END
END
If the AI now tries to add an event set_EW_Green without the necessary guard, Rodin will refuse to discharge inv4. The proof engine forces the AI to discover the missing safety logic.
The resulting C code, generated automatically, cannot crash the intersection. Not because of a borrow checker, but because the state space of “Crash” is unreachable by construction.
Conclusion
The “Rust Evangelism Strike Force” has done a great service by making us care about safety again. But they are solving the problem of 2015, not 2026.
We are moving from an era of Syntactic Safety (Rust) to Semantic Safety (AI + Formal Methods).
In this new world, “Unsafe C” is perfectly fine, as long as it is the output of a proven formal model.
So, to the senior engineers and CTOs reading this: Stop hiring people to rewrite your legacy C in Rust. Instead, hire architects who can articulate what your system should do, and equip them with AI agents that can prove it.
The horse is dead. Get out of the carriage.


