What's the rationale behind the ordering of Scala's value/variable declaration when including a type identifier? [duplicate]

Question 1

I'm trying to wrap my head around Scala, and one thing that keeps throwing me is the ordering of a variable/value declaration when specifying the type.

val a = 0

makes perfect sense. This looks pretty much like any other language.

val a: Int = 0

parses really weird in my head; it just seems nonsensical. Why is the type immediately on the left of the assignment operator? When I cut this in my head, I see "... Int = 0", which obviously doesn't make any sense.

Is there a logical reason behind this that I can refer to? Obviously, as I look at Scala code more, I will adjust to it, but I'm also curious why Martin Odersky would choose to arrange it as such. It can't be just to stand out from other languages, where (as far as I know of), the type identifier, if there is one, precedes the declaration.

Question 2

I don't know much Scala, but val a: int = 0 is valid Standard ML/Ocaml/F#. So rather than standing out, it fits right in with other functional languages which have probably influenced Scala (e.g. pattern matching).

Question 3

Indeed, and in almost every academic paper on types, the type annotation is after the symbol/identifier. Since Scala is largely academic in origin that likely contributes.

Question 4

This is the syntax that is used by a huge number of programming languages, e.g. the entire Pascal family, both designed by Niklaus Wirth himself (Pascal, Modula, Modula-2, Oberon, Oberon-2) as well as others (Modula-3, Turbo Pascal, Active Oberon, Delphi, Object Pascal). It is also the syntax used by almost all functional languages (ML, SML, Caml, OCaml, F#, Haskell, Miranda, Frege). Many imperative languages outside the Pascal family use it as well, e.g. Go, Visual Basic, VB.NET, TypeScript. Plus, it's also the notation used in math. Note that Odersky studied under Prof. Wirth.

Question 5

The only other functional language I know is Haskell, and despite what Jorg says, It doesn't really use this arrangement. Haskell's explicit signatures resemble what Sebastian describes as ambiguous (let a = 0 :: Int), only it uses a double colon instead, and refers to the whole expression.

Question 6

@Carcigenicate Isn't it convention in Haskell to write the type annotation in the line preceding the declaration?

Question 7

stand out from other languages

No. As Jörg already commented, this form is actually used in many languages. It is probably the most common form of variable declaration by number of languages that use it. It was used back then with Pascal and related languages and it is now being used by all the new ones like TypeScript, Go, Rust—and Scala.

type identifier, if there is one, precedes the declaration

The

type identifier [ = value ]

form of declaration in C was in some respects a big mistake. Its serious problem is that it makes the grammar of the language contextual. Type and object identifiers look the same syntactically, but this form of declaration cannot be recognized without knowing that the first identifier identifies a type. So the compiler can't build the syntax tree without referring to the table of already defined types. This causes problems to templates, because the interpretation may depend on the parameter, so the compiler can't know whether it is looking at a type yet.

In C++ this means you have to use typename keyword in the ambiguous cases. Java and C# dodge this by not having typedefs, so you can't have related types, but that seriously limits usefulness of their templates. And it still complicates the compiler anyway.

On the other hand with declarations in the form

keyword identifier [ : type ] [ = value ]

the identifier after : (and some keywords like new) always means type and identifier in any other place never does and the grammar is context-free and everything is much simpler.

It is also more regular when the type is optional. You just omit it. In the C form, you have to replace it with special keyword.

Question 8

This is a bit of a tangent, but another way in which C's declaration syntax is an unfortunate historical mistake is that C objects are not variables in the mathematical sense. A variable stands for an unknown value and once bound doesn't change. A C object is a reference to a block of memory that just happens to be implicitly dereferenced for you. Both the terminology and the use of the = symbol are highly confusing to beginners because it breaks the mental model they've built up in math classes.

Question 9

@Doval: Well, in procedural programming "variable" always means a box that can contain a value and did so long before C. Only functional and logical programming commonly comes with variables in the mathematical sense. The terminology mismatch is a bit unfortunate, but it's the first thing you have to understand when learning programming independent of language. And the use of := does not really make much difference. <- looks like a better symbol, but the only language I know that uses it is R (which might get it from S+, but I don't know that).

Question 10

@JanHudec: Smalltalk used ← initially (and ↑ for return). However, these characters only existed in the character sets and on the keyboards of Xerox's own workstations, they didn't exist anywhere else. When transferring Smalltalk source code to an ASCII-based system, those codepoints are interpreted as _ (and I forgot the other one). I believe Squeak still accepts _ for assignment, but the spec was changed to use := for assignment and ^ for return.

Question 11

You're thinking about it the wrong way. The type isn't immediately to the left of the assignment, it's immediately to the right of the declarator. This syntax has the advantage of being unambiguous, whereas for example val a = 0 : Int is ambiguous: does the type specifier refer to the literal, the declaration, or the entire statement? And if the initializer is more complicated than just a literal, it gets really confusing.

Question 12

Note that val a: Long = 0: Short is legal. It is a type annotation for a and a type ascription for 0. It doesn't make much sense here, but it is legal.

Jan Hudec Jan Hudec 18.5k1 gold badge41 silver badges65 bronze badges · Accepted Answer · 2014-12-04 06:25:46Z

stand out from other languages

No. As Jörg already commented, this form is actually used in many languages. It is probably the most common form of variable declaration by number of languages that use it. It was used back then with Pascal and related languages and it is now being used by all the new ones like TypeScript, Go, Rust—and Scala.

type identifier, if there is one, precedes the declaration

The

type identifier [ = value ]

form of declaration in C was in some respects a big mistake. Its serious problem is that it makes the grammar of the language contextual. Type and object identifiers look the same syntactically, but this form of declaration cannot be recognized without knowing that the first identifier identifies a type. So the compiler can't build the syntax tree without referring to the table of already defined types. This causes problems to templates, because the interpretation may depend on the parameter, so the compiler can't know whether it is looking at a type yet.

In C++ this means you have to use typename keyword in the ambiguous cases. Java and C# dodge this by not having typedefs, so you can't have related types, but that seriously limits usefulness of their templates. And it still complicates the compiler anyway.

On the other hand with declarations in the form

keyword identifier [ : type ] [ = value ]

the identifier after : (and some keywords like new) always means type and identifier in any other place never does and the grammar is context-free and everything is much simpler.

It is also more regular when the type is optional. You just omit it. In the C form, you have to replace it with special keyword.

This is a bit of a tangent, but another way in which C's declaration syntax is an unfortunate historical mistake is that C objects are not variables in the mathematical sense. A variable stands for an unknown value and once bound doesn't change. A C object is a reference to a block of memory that just happens to be implicitly dereferenced for you. Both the terminology and the use of the = symbol are highly confusing to beginners because it breaks the mental model they've built up in math classes.
@Doval: Well, in procedural programming "variable" always means a box that can contain a value and did so long before C. Only functional and logical programming commonly comes with variables in the mathematical sense. The terminology mismatch is a bit unfortunate, but it's the first thing you have to understand when learning programming independent of language. And the use of := does not really make much difference. <- looks like a better symbol, but the only language I know that uses it is R (which might get it from S+, but I don't know that).
@JanHudec: Smalltalk used ← initially (and ↑ for return). However, these characters only existed in the character sets and on the keyboards of Xerox's own workstations, they didn't exist anywhere else. When transferring Smalltalk source code to an ASCII-based system, those codepoints are interpreted as _ (and I forgot the other one). I believe Squeak still accepts _ for assignment, but the spec was changed to use := for assignment and ^ for return.

Stack Exchange Network

What's the rationale behind the ordering of Scala's value/variable declaration when including a type identifier? [duplicate]

2 Answers 2

Linked

Hot Network Questions

What's the rationale behind the ordering of Scala's value/variable declaration when including a type identifier? [duplicate]

2 Answers 2

Linked

Related

Hot Network Questions