I was reading the excellent book by Axel Raushmayer, Tackling TypeScript.
In this section of Chapter 7, the author makes the interesting claim
In many programming languages, null is part of all object types. For example, whenever the type of a variable is String in Java, we can set it to null and Java won’t complain.
I am a little confused by the claim of "many" here. Exactly how many languages do this? C++ doesn’t, as pointer types are explicit (a string
can never be null, but a string*
can have the value nullptr
). Neither does C. Neither do Rust, Swift, TypeScript, etc. Certainly all dynamically languages do not, as they have a Null or NoneType or similar. C# does because of its Java ancestry, and Go has nil
for its reference types.
The way I understand the programming language landscape, it was Java that introduced this version of the billion dollar mistake, though they may have stolen it from an early Algol. And I don’t believe it is common at all.
Am I correct that this claim is misleading at best and that it is only Java (and C# and Go?) among modern static programming languages that conflate nulls with object types? If the claim is correct, then there is much about language type systems I don't understand, and was hoping the community here could come up with a list of popular languages for which this situation holds.
(I'm asking on software engineering stack exchange rather than the computer science stack exchange since the problem of having null values in static object types has, we all know, many ramifications for engineering correct and secure code. Regarding what I have tried so far, I've looked at all the languages mentioned above, as well as Julia, the ML family of languages, and more.)
CLARIFICATION
In response to some comments, I think I need to be more clear about the question. The question is independent of value or reference semantics (copy vs. sharing), and independent of primitive or non primitive types, and independent of mutability vs. immutability.
The language Java automatically, implicitly, adds an extra member to many types. For example, when giving a variable a type constraint such as:
String supervisor;
we can assign the value null
to the variable supervisor
. Now I think it is reasonable to take the point of view that null
is not a string: it does not have a length, it can not be reversed, etc. In fact the very use of null
is to indicate there is no supervisor. So including null
in the type is clunky at best and confusing at worst (and yes, I know about NaN
). I totally get that it's here because it is "easy," and this is exactly what Tony Hoare admitted in his famous Billion Dollar Mistake mea culpa.
Now languages with optionals built-in from the ground up (Swift, Rust, Haskell) and those with explicit pointers (C, C++), would never, ever allow anything other than an actual string to be assigned to a variable constrained to type string. Yet Java does. I am asking whether anyone knows of any other languages that do. I do not suspect there are many, that's all. :)
3 Answers 3
Let me first say a few words how I understand the question: dynamic typing doesn't count, though any dynamically typed variable - regardless of the type it holds before - can be set to null or None. But the variables themselve don't have a type assigned with them, assigning null to them simply changes the type of their content.
That means your question is about languages which allow static typing, and the inclusion of null
in those static types, which means that variables which are declared as some object type will be allowed to hold null by default (without having to add some special modifier to the type name, like "*", "&" or "?").
The only language family apart from Java and C# I know where the "standard static object types" automatically include an equivalent of null is the Visual Basic familiy, which includes (classic) Visual Basic, VBA) and VB.Net. The equivalent of null is Nothing
in this case.
In VBA or VB6, for example, this is correct code:
Dim x As New Collection
Set x = Nothing
It does not work for strings, however, since String
is not an object type in VB6/VBA (it is in VB.Net).
There may be more such languages, but I am currently not aware of any them - there are surely people here in the community which have a far greater knowledge of different languages than me, maybe someone of them has another idea.
BTW, I agree that the availability of null
is not a question of "reference types" vs. "value types". For example, in modern C# (version >=10), you can have all 4 combinations: reference types with and without null, and value types with and without null.
-
1Scala allows "null references by default". I think people would probably like it if it didn't, but you sort of have to if you want efficient interop with Java libraries.Philip Kendall– Philip Kendall06/10/2023 08:52:40Commented Jun 10, 2023 at 8:52
-
You mentioned it in a comment to the OP's question but not in this answer: C and C++ pointer types are all nullable so also belong to the list of languages that have the billion-dollar mistake. (And maybe Algol W as the historical language that introduced the mistake.)Géry Ogam– Géry Ogam04/04/2024 16:25:52Commented Apr 4, 2024 at 16:25
The author refers to the types of object pointers or references. In C++, any pointer can be NULL, e.g. Foo*
is either a pointer to a Foo
object or NULL
. Similarly in Java, any object reference can be null
.
This is a giant hole in the type system. You would expect a static type system to ensure that a variable declared with the type Foo*
would be guaranteed to hold a pointer to an object of type Foo
. But actually, it might also be a special value that is not a pointer at all, and which will crash your program if you dereference it.
The phrasing in the quote might be ambiguous when coming from a C++ mindset where there is a distinction between an object type and a pointer type. But the quote uses Java terminology where there is no distinction since object types are always reference types. The issue is exactly the same with object pointers in C++ though.
Of course, sometimes you do need an "optional" pointer, for example in a linked list. The problem is not the existence of nulls per se, the problem is it is invisible to the type system whether a pointer may be null or not.
The issue affects a whole bunch of Algol-inspired languages. According to the StackOverflow survey, the most used statically typed languages are TypeScript, Java, C#, C++, C, and Go, and all of them except TypeScript have this issue. So it is fair to say this problem is quite prevalent.
As you note, dynamically typed languages do not really have the problem, since they don't provide any static guarantees in the first place. And statically typed functional languages of the ML-lineage tend to have Option-types which are explicit in the type system.
Languages designed within the last 20 years tend to include null-safety. For example Rust, Kotlin, newer versions of C# (which have it as an opt-in feature), and Swift. I believe Go still have the issue, but while Go is a relatively new language, it is deliberately conservative in its design, for better of worse.
In C++, there are variables of class type, used as local, static or global variables or as parameters, and there are dynamically allocated values of class type, referenced by a pointer which can be null, or by a reference which by language rules must never be null.
In Swift, you have reference types, that is dynamically allocated objects, referenced by a reference which must never be null, and value types which are local, static or global variables or parameters, or fields of another reference or value type. All of them are never null.
And then you have enum types, which have a tag like C++ enums, and a different associated value for each different value of the tag. One such enum has a tag "none" with no associated value, and a tag "some" with some value. And there’s a lot of syntactic sugar that lets you use this enum as an "optional".
Whatever type T you have, you can create an enum value of type "optional T". If you have a variable of type "optional T", you can assign nil, compare with nil, compare with another value of type T or optional T with well defined rules how nil and non-nil compare.
The word "nil" on its own has type "nil literal" and can be used in many places automatically converted to the appropriate thing.
What you can’t do: Assign nil to a variable of type String, only to one of type optional String. String itself can never be nil, optional String can.
The compiler uses an optimisation where for types where all bits zero is not a legal value, all bits zero is used to store nil for the optional type. So optional string takes 64 bit (String can never be all bits zero) but optional Int64 uses 65 bits.
Explore related questions
See similar questions with these tags.
string
, say, means it can only hold values of thestring
type. It is impossible to assignnullptr
to such a variable. For examplestring s = nullptr;
is a type error in C++ (you have to saystring* p = nullptr;
), whileString s = null;
is perfectly fine in Java. C++ does is right by makingstring
andstring*
completely different types. Java is weird IMHO by makingString
refer to the union of strings and the null value. Hope that clears it up!nil
which seems to work roughly the same way asnull
in Java. Can you explain what difference you see?String x
in C++ is ashared_ptr<string> x
(much more similar thanstring *x
orstring &x
, not astring x
). Andshared_ptr
variables can be null. C# also distingusihes between reference types and value types. reference types could be null in the past. but starting with C#10, even reference type variablesT x
cannot be null by default unless one allows them to be explicitly by declaring them asT? x
(this is a feature which can be switched off for backwards compatibility).