How programming languages are capable of handling generics ?
Here is some context on why I got the question.
Constructs like Lists, Dictionaries and Sets in Python can handle any type of data without even specifying the type of data being added. Whereas in Java Collections require you to specify the type at the time of creating those collections objects.
Both Python and Java are originally implemented in C\C++. So the capability of Java and Python to handle Generics should had obtained from C\C++. Even Javascript engines are built on C++.
To answer my own questions I started thinking why we need to specify the type when we are creating an object, my understanding for this is because to inform the compiler the amount of memory to be allocated in order for the object to live in Memory (RAM). Thus if we are creating an ArrayList in Java of say a MyRandomClass, it should likely allocate an array of Memory Blocks(MB) where each MB is of the size of MyRandomClass
However in Python a list can hold any type of Class without even specifying the type at the time of creation.
What happens under the hood ?
2 Answers 2
You mention Python, which is a dynamically typed language. The answer is that it represents the type explicitly as part of the object. You can imagine the following picture: an element of a list or a dictionary is a structure with the following fields
type: <string or type ID>
data: <pointer to memory>
This is how types are represented: as an explicit byte in memory which says what type something is.
In fact, if you're curious, in Python you can see this directly. Fire up a python shell and do x = 3
, then type(x)
. Python knows that x
has type int
. How does it know? Well check out
dir(x)
to see that there are a bunch of attributes that x has:
'__abs__', '__add__', '__and__', '__bool__', '__ceil__', '__class__', '__delattr__', '__dir__', '__divmod__', '__doc__', '__eq__', '__float__', '__floor__', '__floordiv__', '__format__', '__ge__', '__getattribute__', '__getnewargs__', '__gt__', '__hash__', '__index__', '__init__', '__init_subclass__', '__int__', '__invert__', '__le__', '__lshift__', '__lt__', '__mod__', '__mul__', '__ne__', '__neg__', '__new__', '__or__', '__pos__', '__pow__', '__radd__', '__rand__', '__rdivmod__', '__reduce__', '__reduce_ex__', '__repr__', '__rfloordiv__', '__rlshift__', '__rmod__', '__rmul__', '__ror__', '__round__', '__rpow__', '__rrshift__', '__rshift__', '__rsub__', '__rtruediv__', '__rxor__', '__setattr__', '__sizeof__', '__str__', '__sub__', '__subclasshook__', '__truediv__', '__trunc__', '__xor__', 'as_integer_ratio', 'bit_length', 'conjugate', 'denominator', 'from_bytes', 'imag', 'numerator', 'real', 'to_bytes'
and among these, there's one called __class__
. If you do
x.__class__
then Python prints out, <class 'int'>
.
This is, to a rough approximation, how basically all dynamically typed languages do types. In statically typed languages like C++ or Rust, things are different. Instead of representing the type as part of the object, the compiler figures out the type at compile-time, and uses it to construct machine code for your program, and makes sure that all the types are as expected in the machine code, a process called type-checking. Once the compiler is certain all the types are correct, it then throws them away so that the machine code and explicit memory don't have any types anywhere.
As to generics, the process is similar: in dynamically typed languages the type is part of the object, in statically typed languages like C++ or Rust, the compiler has to figure out all the generic types that you use at compile-time, generate code for each one, and then get rid of the types in the machine code.
As an example: Apple's Swift compiler will compile the required code for every type needed, then compare the various versions, and if two are identical then one of them is picked.
For example, lots of the code for an array of unsigned 32 bit integers and for an array of signed 32 bit integers may very well be identical, while the code for an array of 64 bit integers, signed or unsigned, would be different. The code for an array of any reference type might be different.
It's up to the language designer and to some extent the designer of the standard library to make sure not too much unnecessary code is being produced.
[3, "hello", [4, 5]]
. $\endgroup$