It is said that C's type system is unsound, which means that it has "false negatives", it tells the programmer everything is fine, but then the code fails at runtime. for example, "the program can reach a state where an expression evaluates to a value that doesn't match the expression's static type"
I learned it from the lecture notes of the course CSE341 in university of Washington which are available online. It says:
But we can give a more precise description of correctness by defining the terms soundness and completeness. For both, the definition is with respect to some thing X we wish to prevent. For example, X could be, "a program looks up a variable that is not in the environment."
A type system is sound if it never accepts a program that, when run with some input, does X.
A type system is complete if it never rejects a program that, no matter what input it is run with, will not do X.
So, basically a language is unsound if a program type-checks and claims there is no unwanted behavior, yet that unwanted behavior occurs at runtime.
For instance, in C we can write:
#include <stdio.h>
int main() {
// in grams
float medicineAmounts[] = {12, 8, 5.5};
float totalAmount = 0;
for(int i = 0; i < 5; i++) { // mistake: should be i < 3
totalAmount += medicineAmounts[i];
printf("%f\n", medicineAmounts[i]);
}
printf("total: %f grams", totalAmount);
// outputs: total 607785.50 grams :D
return 0;
}
My question is: while the language is clearly unsafe, is it really unsound? We expect the expression medicineAmounts[i] to always evaluate to a value of type float (which it does—the garbage values are still bit patterns interpreted as floats). But does the C standard actually promise that out-of-bounds access will work correctly? Since the C standard says it's an "undefined behavior", and undefined means the implementation can do anything, why call it unsound when C doesn't consider it a problem? In other words: can a language be called "unsound" if it never promised the safety guarantees it's accused of violating?
Edit this Wikipedia article is very nice and full good insights and links to good resources
what precisely it means for a program to be "well typed" or to "go wrong" are properties of its static and dynamic semantics, which are specific to each programming language. Consequently, a precise, formal definition of type soundness depends upon the style of formal semantics used to specify a language
In 1994, Andrew Wright and Matthias Felleisen formulated what has become the standard definition and proof technique for type safety in languages defined by operational semantics,[4] which is closest to the notion of type safety as understood by most programmers [...]
Andrew Wright and Matthias Felleisen defined one proprty that seems to answer my question:
Progress: A well-typed program never gets "stuck": every expression is either already a value or can be reduced towards a value in some well-defined way. In other words, the program never gets into an undefined state where no further transitions are possible.
By that definition, I think, it makes the C example program I wrote above unsound.
Finally, perhaps it's worth noting, Matthias Felleisen once said in a presentation on Youtube, that many academics think of soundness as a yes/no matter, and he, as an academic and a programmer, believes it's not practical criteria. he said, we should think of it as a spectrum, we went from really evil (C) to pretty ok (e.g. Java)
-
1Your assumption "which it does - ...." is wrong. OOB access is UB, and UB means "you can't assume anything", as you rightly say. OOB access out of an allocated page willl not be interpreted as a float.tofro– tofro08/19/2025 07:26:15Commented Aug 19 at 7:26
-
3.... and I never heard anyone say "C is unsound" - please give a reference. C is GIGO (and very educational ...) - do you mean the same thing?tofro– tofro08/19/2025 07:30:09Commented Aug 19 at 7:30
-
1"We expect the expression medicineAmounts[i] to always evaluate to a value of type float (which it does—the garbage values are still bit patterns interpreted as floats)." No, the program has undefined behaviour. The rest of the rules of C don't apply and the implementation can do whatever it wants. E.g. a compiler can assume this is unreachable code and omit it.Caleth– Caleth08/19/2025 08:18:10Commented Aug 19 at 8:18
-
2The funniest thing about "type soundness" seems to be that nobody can agree on a definition of it that is sound! I don't think it's so much about "violating guarantees" (which as you observe, may or may not have been given), but about whether each value always has a definite type (as defined by that language) at a particular time, and this definite type can always be inferred by the compiler from the source code. (1/2)Steve– Steve08/19/2025 10:56:47Commented Aug 19 at 10:56
-
2@Ghassen: I took the freedom to improved the citing. I think without this cite, the question was quite unclearDoc Brown– Doc Brown08/19/2025 15:19:33Commented Aug 19 at 15:19
6 Answers 6
Your quote contains a very important point.
the definition is with respect to some thing X we wish to prevent.
If a language doesn't promise anything, then it is vacuously sound, because it prevents everything that it promises to prevent. In the case of C, nothing is promised about arbitrary programs, the only promises relate to a subset, "well-formed" programs.
We expect the expression
medicineAmounts[i]
to always evaluate to a value of type float.
That isn't something you should expect from C. Because it accesses beyond the array, the program is ill-formed, and C promises nothing (about the whole program).
To contrast, Java does make some promises. E.g. promises that a variable of type T
is either null, or holds a reference to an object of type T
. NullPointerExceptions are an expected part of Java, it doesn't promise that a variable must hold a value.
However, we can subvert the promise about the type of a value. Thus we say that Java is unsound, because it makes promises that it doesn't keep.
U Transmute<T, U> (T value) {
U[] us = new U[1];
Object[] objs = us;
objs[0] = value;
return us[0];
}
-
1"In the case of C, nothing is promised about arbitrary programs, the only promises relate to a subset, "well-formed" programs." In other words, if you can't dive off the 10M platform safely, keep your tush in the kiddie pool.Andrew Henle– Andrew Henle08/20/2025 15:10:39Commented Aug 20 at 15:10
-
2This is the answer I was looking for, even though the conclusion almost seems crazy (C is sound and Java is unsound) When I first read the definition it surprised me a bit (and something felt off) because it made if feel as if C is sound, because it pretty much promises nothing, and Java is unsound because it promises you some things it doesn't deliver in some rare cases. so the the concept of soundness doesn't seem that useful to me right now in fact this is thing I always hear about soundness: "it's useless" ... but never have I ever seen someone explain why. thank you for your insightGhassen– Ghassen08/22/2025 18:59:35Commented Aug 22 at 18:59
-
@Ghassen you can also say that for any property, C is unsound w.r.t. that property.Caleth– Caleth08/25/2025 12:59:53Commented Aug 25 at 12:59
The definition of "unsound" as you write it is subject to caution:
- languages with no strong type systems (e.g. lisp) could never be unsound despite unwanted behavior at runtime; "unwanted" is indeed much too vague.
- rare languages claim at compile time that there is not unwanted situation possible. Language specifications of C or C++ are plenty of cases where unauthorised use of types would be illegal or undefined according to the language specifications.
- many languages with dynamic typing could also be considered unsound as they may result in unwanted behavior at runtime (e.g. depending on runtime order of initialisation) despite clean compilation.
- the absence of claim that there could be an error is not to be confused with the claim of absence of errors.
Back to your specific example: C doesn't allow out-of-bound access. It's UB. It could work with a little luck (because it's unsafe) but in general bad things might happen.
The issue is that, except in very trivial cases, it is quite difficult to predict if an access may go out of bounds. Nobody would accept that compilers take an exponential time longer, just to compute every possible execution path and value combinations.
Fortunately, there are static analysis tools or options that can spot the most obvious out of bound errors and raise the error. So it's not the language that is unsound, but the power of the language tools that are not advanced enough.
Edit: Additional thoughts
Soundness is a concept that was apparently introduced by Robin Milner in 1977 when he was working on ML a functional language with a sophisticated type inference system (the ancestor of Caml). But this is a fascinating language that despite its qualities never made it into the mainstream.
More recently it regained some popularity, possibly boosted by Rust, a new language that made it into the mainstream with strong typing. Unfortunately, the existence of unsafe
constructs and reliance on library that introduce a dose of unsoundness, suggest that it is very difficult to have mainstream soundness. So the quality sound/unsound can only relate to a part of the language an its implementation.
In 2015, a soundness manifesto was even published. Reading it in diagonal I understand that the authors see soundness more as a quality of a program being checked and a static analysis feature rather than a language feature, which seems to confirm my initial position. (If I was too fast in the reading, I'd appreciate some clarifying comments.)
It is not possible, even in theory, for a compiler to reject all programs that will fail at runtime, while also allowing all valid programs. From your own link:
A type system is sound if it never accepts a program that, when run with some input, does X.
A type system is complete if it never rejects a program that, no matter what input it is run with, will not do X.
...
it is impossible to implement a static checker that given any program in your language (a) always terminates, (b) is sound, and (c) is complete.
It also contain this claim:
In most modern languages, type systems are sound (they prevent what they claim to) but not complete (they reject programs they need not reject).
This might be true in the narrow context of the type system. I.e. the type system might ensure that "you can only access members that are declared for a specific type".
But the type system cannot be used to detect all kinds of problems. Trying to design a type system to statically detect out of bounds access would be difficult at best, and chances are it would rejects so many valid programs to be unusable in practice.
So in a more general sense, almost all languages/compilers will accept programs that will fail at runtime, and only reject programs that that are guaranteed to be faulty. I.e. the inverse of the quote above. Many languages also contains escape hatches that lets you circumvent the type system. See dynamic
and unsafe
in c# as a few examples.
You can also turn this argument around, i.e. when designing a language you should only specify compile time errors for things you can feasibly check at compile time, since no one will use your language if you cant write a compiler for it, or if the compiler takes ages to run. C/C++ have many things it specifies as "undefined behavior", where there is neither compile time nor runtime checks. This makes it easier to write the compiler, and avoids potentially expensive runtime checks. But increases the difficulty when writing programs. Other languages make other tradeoffs.
Also keep in mind that many "failures" are on the human side of the software development process. So the language and compiler can only do so much to prevent bugs.
-
> "almost all languages/compilers will accept programs that will fail at runtime, and only reject programs that that are guaranteed to be faulty" I think you're touching on idea I'm failing to understand.. I'm thinking, we wouldn't call Java unsound (should we?) just because its type checker allows us to call methods on null, (which raises NullPointerException)... C confused me, because people say it's unsound because it allows certain programs to fail at runtime... yes it doesn't deal with them and strange things can happen, but it seems the same to me, language doesn't claim to prevent it.Ghassen– Ghassen08/19/2025 15:34:37Commented Aug 19 at 15:34
-
Now I'm also thinking how it's connected to "undefined behavior"... so if C's standard defined the failure and asked the implementers deal with it (e.g. raise exceptions, like in Java), would that make it sound? but does that follow the definition of soundness above?Ghassen– Ghassen08/19/2025 15:37:26Commented Aug 19 at 15:37
-
It also depends on exactly how you define C's type system. It isn't formally defined, so it isn't fully clear what its guarantees are in the edge cases. For example, the C standard isn't quite clear on whether the following is undefined behavior:
struct S { int a; int b }; int f(S x) { return *(int*)(((uint_ptr_t)&(x.a)) + sizeof(int)); }
;user1937198– user193719808/19/2025 16:05:50Commented Aug 19 at 16:05 -
@Ghassen we call Java unsound because, for instance, you can transmute any type to any other type, even unrelated ones, via arrays.Caleth– Caleth08/20/2025 06:32:11Commented Aug 20 at 6:32
-
1@Ghassen I'm trying to avoid "soundness" and "complete", since I see little utility for these terms in every day programming. They might be more useful if you study language theory or something.JonasH– JonasH08/20/2025 07:46:29Commented Aug 20 at 7:46
You are correct. C does not claim out-of-bounds errors cannot happen, thus it does not prove C is unsound if they happen. It just means you have to be careful when writing C. (But of course it means the program is unsound if it exhibit unexpected out-of-bounds errors. But typically we would call a program buggy rather than unsound.)
Soundness is always relative to some claim by the language. Some languages with more advanced type systems might claim to statically prevent out-of-bounds errors - if such an error happened anyway, it would mean the claim was unsound.
C is sometimes said to be unsound because it is easy to cast values to a wrong type, thus undermining the guarantees of they type system.
Be careful with the term "safety" since this have different meaning is different contexts. But typically the safety of a language refer to how much havoc a mistake can cause. C is unsafe since a mistake (like an out-of-bounds write) may cause memory corruption, while in Java it will just cause an exception. But this is not a question of soundness since C does not claim to be memory safe.
There is no such thing as "safety" in software.
Because there is no such thing as "safety" in the natural world. Software exists in an inherently unsafe natural world.
Re
A language is unsound if a program type-checks and claims there is no unwanted behavior, yet that unwanted behavior occurs at runtime.
That's just a claim and a label.
There is no bug free software. Not in the typed programming world, not in the untyped programming world. Else, there would be no bugs, viruses, malware, nor new features that "promise" this or that for new ideas or to patch old ideas.
You can consider a programming language anything you want. People do it all of the time. Usually people call languages "unsafe" when they prefer to use another language. Or, they just deal with the language being labelled "unsafe" and "unsound", per whomever, even themselves, and carry on, in spite of such labels they or somebody else slaps on a programming language.
Just because somebody comes up with a term for this or that does not compel anybody else to recognize or adopt such a term in their policies, practice, personal or professional activities.
-
The question is not about if there is bugs in the code. Soundness refers to the soundness of the type system of the language.JacquesB– JacquesB08/27/2025 15:37:24Commented Aug 27 at 15:37
What really counts is that your software processes its input and produces the result that it should produce according to the input.
Now let’s say you have an array a with ten entries and your code tries to read or write a[12]. This cannot work. In C or C++ you get undefined behaviour, which may be a crash or worse. In a safer language, it may throw an exception, or crash the program, but it will not read a[12]. Now you might have a language where the compiler forces you to check successfully before allowing you to read a [12]. Still you can’t read a[12] so you likely don’t get the behaviour you want.
Of course reducing the number of possible errors, possibly finding them at compile time, is a good thing. But still if you cannot handle an error then it isnt handled.
Explore related questions
See similar questions with these tags.