Three ways formally verified code can go wrong in practice

rook_line_sinkr · 2025-10-12T21:21:34 1760304094

I got told to use these words back in uni

verification - Are we building the software right?

validation - Are we building the right software?

makes many a thing easier to talk about at work

moandcompany · 2025-10-13T00:40:25 1760316025

Yep... Along with this

Verification aligns with a specification. E.g. verify if what was implemented matches what was specified.

Validation aligns with a requirement. E.g. verify if what was implemented solves the actual problem.

dapperdrake · 2025-10-13T01:37:14 1760319434

There is an important and distinct pair of definitions used by a possibly smaller but significant number of people:

Verification: formal theoretical proof

Validation: Empirical test based approach

01HNNWZ0MV43FF · 2025-10-13T02:01:01 1760320861

If it's hard to remember which is which, maybe they should be different words.

Like "Customer validation" and "Verification against the spec"

Like "sympathy" and "empathy". I finally heard a mnemonic but if I need a mnemonic maybe it's a sign that the words are just too hard to understand

skybrian · 2025-10-12T21:11:36 1760303496

Many interesting statements aren't a property of the code alone. They're a property of the code when it's run in a particular environment. If you want the proof to be portable then it should only make assumptions that are true in any environment.

dapperdrake · 2025-10-13T01:38:12 1760319492

No assumption holds for all environments.

Posh example: Axiom of choice.

ip26 · 2025-10-12T20:54:44 1760302484

Is asserting the assumptions during code execution not standard practice for formally verified code?

kragen · 2025-10-13T00:23:38 1760315018

By "asserting X" do you mean "checking whether X is true and crashing the program if not", like the assert macro in C or the assert statement in Python? No, that is almost never done, for three reasons:

• Crashing the program is often what you formally verified the program to prevent in the first place! A crashing program is what destroyed Ariane 5 on its maiden flight, for example. Crashing the program is often the worst possible outcome rather than an acceptable one.

• Many of the assumptions are not things that a program can check are true. Examples from the post include "nothing is concurrently modifying [variables]", "the compiler worked correctly, the hardware isn't faulty, and the OS doesn't mess with things," and, "unsafe [Rust] code does not have [a memory bug] either." None of these assumptions could be reliably verified by any conceivable test a program could make.

• Even when the program could check an assumption, it often isn't computationally feasible; for example, binary search of an array is only valid if the array is sorted, but checking that every time the binary search routine is invoked would take it from logarithmic time to linear time, typically an orders-of-magnitude slowdown that would defeat the purpose of using a binary search instead of a simpler sequential search. (I think Hillel tried to use this example in the article but accidentally wrote "binary sort" instead, which isn't a thing.)

When crashing the program is acceptable and correctness preconditions can be efficiently checked, postconditions usually can be too. In those cases, it's common to use either runtime checks or property-based testing instead of formal verification, which is harder.

ip26 · 2025-10-13T02:21:15 1760322075

This becomes an interesting conversation then. First of all, it could mean "checking whether X is true and logging an error" instead of exiting the program.

- But if you aren't comfortable crashing the program if the assumptions are violated, then what is your formal verification worth? Not much, because the formal verification only holds if the assumptions hold, and you are indicating you don't believe they will hold.

- True, some are infeasible to check. In that case, you could then check them weakly or indirectly. For example, check if the first two indices of the input array are not sorted. You could also check them infrequently. Better to partially check your assumptions than not check at all.

kragen · 2025-10-13T02:41:45 1760323305

You didn't answer my question!

ip26 · 2025-10-13T02:51:42 1760323902

I mean to ask both: "checking whether X is true and crashing the program if not", like the assert macro, OR assert as in a weaker check that does not crash the program (such as generate a log event).

When crashing the program is acceptable and correctness preconditions can be efficiently checked, postconditions usually can be too.

What's interesting to me is the combination of two claims: formal verification is used when crashes are not acceptable, and crashing when formal assumptions are violated is therefore not acceptable. This makes sense on the surface - but the program is only proven crashproof when the formal assumptions hold. That is all formal verification proves.

cowsandmilk · 2025-10-12T22:07:33 1760306853

That’s impractical. Take binary search and the assumption the list is sorted. Verifying the list is sorted would negate the point of binary search as you would be inspecting every item in the list.

voxl · 2025-10-12T22:23:10 1760307790

ASSERTING the list is sorted as an assumption is significantly different form VERIFYING that the list is sorted before executing the search. Moreover, type systems can track that a list was previously sorted and maintained it's sorted status making the assumption reasonable to state.

jojomodding · 2025-10-12T23:46:15 1760312775

What do you mean when you say "assert" and "verify"? In my head, given the context of this thread and the comment you're replying to, they can both only mean "add an `if not sorted then abort()`."

But you make some sort of distinction here.

bluGill · 2025-10-12T23:51:25 1760313085

Verify means you check. Assert means you say it is, but might or might not check.

cyphar · 2025-10-13T00:30:25 1760315425

I know that Solaris (or at least, ZFS) has VERIFY and ASSERT macros where the ASSERT macros are compiled out in production builds. Is that the kind of thing you're referring to?

You can aslo mark certain codepaths as unreachable to hint to the compiler that it can make certain optimisations (e.g., "this argument is never negative"), but if you aren't validating that the assumption is correct I wouldn't call that an assertion -- though a plain reading of your comment would imply you would still call this an "assertion"? AFAIK, no language calls this construct "assert".

This is probably one of those "depends on where you first learned it" bits of nomenclature, but to me the distinction here is between debug assertions (compiled out in production code) and assertions (always run).

nothrabannosir · 2025-10-13T00:15:15 1760314515

This thread started with:

> Is asserting the assumptions during code execution not standard practice for formally verified code?

Are you using the same definition of "assert" as that post does?

kg · 2025-10-13T00:45:55 1760316355

> Moreover, type systems can track that a list was previously sorted and maintained it's sorted status making the assumption reasonable to state.

This is true, but if you care about correct execution, you would need to re-verify that the list is sorted - bitflips in your DRAM or a buggy piece of code trampling random memory could have de-sorted the list. Then your formally verified application misbehaves even though nothing is wrong with it.

It's also possible to end up with a "sorted" list that isn't actually sorted if your comparison function is buggy, though hopefully you formally verified the comparison function and it's correct.

AnimalMuppet · 2025-10-12T22:10:02 1760307002

Only if you verify it for every search. If you haven't touched the list since the last search, the verification is still good. For some (not all) situations, you can verify the list at the start of the program, and never have to verify it again.

ngruhn · 2025-10-12T21:37:25 1760305045

How would that look like if you accidentally assumed you have arbitrary large integers but in practice you have 64 bits?

appellations · 2025-10-12T22:24:52 1760307892

    Add(x,y):
       Assert( x >= 0 && y>= 0 )
        z = x + y
        Assert( z >= x && z >= y )
        return z

There’s definitely smarter ways to do this, but in practice there is always some way to encode the properties you care about in ways that your assertions will be violated. If you can’t observe a violation, it’s not a violation https://en.wikipedia.org/wiki/Identity_of_indiscernibles

bluGill · 2025-10-12T23:52:56 1760313176

In some languages overflow is asserted as a can't happen and so the optimizer will remove your checks

appellations · 2025-10-13T01:29:21 1760318961

Best I can tell is that overflow is undefined behavior for signed ints in C/C++ so -O3 with gcc might remove a check that could only be true if UB occurred.

The compound predicate in my example above coupled with the fact that the compiler doesn’t reason about the precondition in the prior assert (y is non-negative) means this specific example wouldn’t be optimized away, but bluGill does have a point.

An example of an assert that might be optimized away:

    int addFive(int x) {
        int y = x + 5;
        assert(y >= x);
        return y;
    }

appellations · 2025-10-13T01:07:40 1760317660

Care to share a language where the compiler infers the semantic meaning of asserts and optimizes them away? I’ve never heard of this optimization.

jonathanstrange · 2025-10-12T20:51:02 1760302262

No hardware failure is considered? No cosmic rays flipping bits? No soft or hard real-time guarantees are discussed? What about indeterminate operations that can fail such as requesting memory from some operating system dynamically?

I'm asking because I thought high integrity systems are generally evaluated and certified as a combination of hardware and software. Considering software alone seems pretty useless.

ip26 · 2025-10-13T02:45:41 1760323541

Nowhere does the article claim that:

   "formal verification of the code" -> "high integrity system"

Formal verification is simply a method of ensuring your code behaves how you intend.

Now, if you want to formally verify your program can tolerate any number of bits flip on any variables at any moment(s) in time, it will happily test this for you. Unfortunately, assuming presently known software methods, this is an unmeetable specification :)

codebje · 2025-10-12T21:56:34 1760306194

Specifications that are formally verified can definitely cover real-time guarantees, behaviour under error returns from operations like allocation, and similar things. Hardware failures can be accounted for in hardware verification, which is much like software verification: specification + hardware design = verified design; if the spec covers it, the verification guarantees it.

Considering software alone isn't pretty useless, nor is having the guarantee that "inc x = x - 1" will always go from an Int to an Int, even if it's not "fully right" at least trying to increment a string or a complex number will be rejected at compile time. Giving up on any improvements in the correctness of code because it doesn't get you all the way to 100% correct is, IMO, defeatist.

(Giving up on it because it has diminishing returns and isn't worth the effort is reasonable, of course!)

charcircuit · 2025-10-12T23:15:04 1760310904

Hardware verification doesn't prevent hardware failures. There is a reason RAM comes with ECC. It's not because RAM designers are too lazy to do formal verification. Even with ECC RAM, bit flips can still happen if multiple bits flip at the same time.

There are also things like CPUs taking the wrong branch that occasionally happen. You can't assume that the hardware will work perfectly in the real world and have to design for failure.

codebje · 2025-10-13T02:45:15 1760323515

Well of course hardware fails, and of course verification doesn't make things work perfectly. Verification says the given design meets the specification, assumptions and all. When the assumptions don't hold, the design shouldn't be expected to work correctly, either. When the assumptions do hold, formal verification says the design will work correctly (plus or minus errors in tools and materials).

We know dynamic RAM is susceptible to bit-flip errors. We can quantify the likelihood of it pretty well under various conditions. We can design a specification to detect and correct single bit errors. We can design hardware to meet that specification. We can formally verify it. That's how we get ECC RAM.

CPUs are almost never formally verified, at least not in full. Reliability engineering around systems too complex to verify, too expensive to engineer to never fail, or that might operate outside of the safe assumptions of their verified specifications, usually means something like redundancy and majority-rules designs. That doesn't mean verification plays no part. How do you know your majority-rules design works in the face of hardware errors? Specify it, verify it.

jojomodding · 2025-10-12T23:52:02 1760313122

Designing around hardware failure in software seems cumbersome to insane. If the CPU can randomly execute arbitrary code because it jumps to wherever, no guarantees apply.

What you actually do here is consider the probability of a cosmic ray flip, and then accept a certain failure probability. For things like train signals, it's one failure in a billion hours.

kragen · 2025-10-13T02:44:23 1760323463

> Designing around hardware failure in software seems cumbersome to insane.

Yet for some reason you chose to post this comment over TCP/IP! And I'm guessing you loaded the browser you typed it in from an SSD that uses ECC. And probably earlier today you retrieved some data from GFS, for example by making a Google search. All three of those are instances of software designed around hardware failure.

spartanatreyu · 2025-10-13T00:41:12 1760316072

An approach that has been taken for hardware in space is to have 3 identical systems running at the same time.

Execution continues while all systems are in agreement.

If a cosmic ray causes a bit-flip in one of the systems, the system not in agreement with the other two takes on the state of the other two and continues.

If there is no agreement between all 3 systems, or the execution ends up in an invalid state, all systems restart.

pixl97 · 2025-10-13T01:31:32 1760319092

>Designing around hardware failure in software seems cumbersome to insane

I mean there are places to do it. For example ZFS and filesystem checksums. If you've ever been bit by a hard drive that says everything is fine but returns garbage you'll appreciate it.

charcircuit · 2025-10-12T23:58:48 1760313528

Yet, big sites like Google or TikTok constantly deal with hardware failures everyday while keeping their services and apps running.

themafia · 2025-10-13T00:22:26 1760314946

Even then you have other physical issues to consider. This is one of the things I love about the Space Shuttle. It had 5 computers for redundancy during launch and return. You obviously don't want to put them all in the same place so you spread them out among the avionics bays. You also obviously don't want them all installed in the same orientation so you install them differently with respect to the vehicles orientation. You also have a huge data network that requires redundancy and you take all the same steps with the multiplexers as well.

cpgxiii · 2025-10-13T01:50:43 1760320243

The best example on the Shuttle were the engine control computers. Each engine had two controllers, primary and backup, each with its own set of sensors in the engine itself and each consisting of a lock-step pair of processors. For each engine, the primary controller would use processors built by one supplier, while the backup would use processors of the same architecture but produced by an entirely different supplier (Motorola and TRW).

Today, even fairly standard automotive ECUs use dual-processor lock-step systems; a lot the the Cortex-R microcontrollers on the market are designed around enabling dual-core lock-step use, with error/difference checking on all of the busses and memory.

skybrian · 2025-10-13T00:17:43 1760314663

For portable libraries and apps, there's only so much you can do. However, there are some interesting properties you can prove assuming the environment behaves according to a spec.

kragen · 2025-10-13T00:25:23 1760315123

The article does consider hardware failure, yes.

westurner · 2025-10-12T22:56:36 1760309796

Side channels? Is best out of 2 sufficient or is best out of 3 necessary?

From https://news.ycombinator.com/context?id=39938759 re: s2n-tls:

> [ FizzBee, Nagini, Deal-solver, Dafny; icontract, pycontracts, Hoare logic, DbC Design-by-Contract, invariants, parallelism and concurrency and locks, io latency, pass by reference in distributed systems, "FaCT: A DSL for Timing-Sensitive Computation" and side channels [in hw and software] https://news.ycombinator.com/item?id=38527663 ]

There are so many things to consider;

/? awesome-safety https://westurner.github.io/hnlog/#search:awesome-safety :

awesome-safety-critical: https://awesome-safety-critical.readthedocs.io/en/latest/

Hazard (logic) https://en.wikipedia.org/wiki/Hazard_(logic)

Hazard (computer architecture); out-of-order execution and delays: https://en.wikipedia.org/wiki/Hazard_(computer_architecture)

Soft error: https://en.wikipedia.org/wiki/Soft_error

SEU: Single-Event Upset: https://en.wikipedia.org/wiki/Single-event_upset

And then cosmic ray and particle physics

stephenlf · 2025-10-12T23:49:29 1760312969

This is incredible. This post led me to your GitHub, which is full of similarly incredible content. “Awesome Cold Showers”? Beautiful.