You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Note that `CalcParser::parse` takes care of the AST traversal and correctly maps it to `Vec<Node>` for easier access
@@ -73,21 +73,33 @@ in later stages of compilation.
73
73
74
74
## Interpreter
75
75
76
-
CPU is the *ultimate interpreter*. That is, it executes opcodes as it goes. To do that, after we have changed the representation (aka *lowered* the representation) of our source code `&str` to AST `Node`, a basic interpreter looks and each node of the AST (via any [tree traversal methods](https://en.wikipedia.org/wiki/Tree_traversal)) and simply **evaluates** it *recursively*
76
+
CPU is the *ultimate interpreter*. That is, it executes opcodes as it goes. To do that, after we have changed the representation (aka *lowered* the representation) of our source code `&str` to AST `Node`, a basic interpreter looks at each node of the AST (via any [tree traversal methods](https://en.wikipedia.org/wiki/Tree_traversal)) and simply **evaluates** it *recursively*
Copy file name to clipboardExpand all lines: book/src/01_calculator/basic_llvm.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -20,7 +20,7 @@ We want to define an add function like
20
20
add(x: i32, x: i32) -> i32 { x + y }
21
21
```
22
22
23
-
but using the **LLVM language** and JIT it. Since LLVM is also a VM, it has its own Bytecodes and IR. The point is we need to define *every* bit of what makes up a function through LLVM basic constructs such as context, module, function signature setups, argument types, basic block, etc.
23
+
but using the **LLVM language** and JIT it. For that, we need to define *every* bit of what makes up a function through LLVM basic constructs such as context, module, function signature setups, argument types, basic block, etc.
24
24
25
25
Here is how to *stitch* our add function in LLVM
26
26
@@ -37,7 +37,7 @@ Here is how to *stitch* our add function in LLVM
Copy file name to clipboardExpand all lines: book/src/01_calculator/calc_intro.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,7 +7,7 @@ If you haven't cloned the [GitHub](https://github.com/ehsanmok/create-your-own-l
7
7
To start, we have `1 + 1;` in [examples/simple.calc](https://github.com/ehsanmok/create-your-own-lang-with-rust/blob/master/calculator/examples/simple.calc) where you can compile with
8
8
9
9
```text
10
-
cargo build --bin main // create the CLI executable for Calc
10
+
cargo build --bin main // create a simple executable for Calc
To get most out of this chapter, it is recommended to at least try the first exercise below
4
+
3
5
1. Add support for multiplication and division to the calculator and allow computations on floating numbers `f32`. Can you include standard operator precedence?
4
6
2. JIT with [cranelift-simplejit](https://docs.rs/cranelift-simplejit/0.64.0/cranelift_simplejit/)
5
7
3. JIT with [gcc-jit](http://swgillespie.me/gccjit.rs/gccjit/)
Every language needs a (formal) grammar to describe its syntax and semantics. Once a program adheres to the rules of the grammar in *Source Code* (for example as input string or file format), it is *tokenized* and then *lexer* adds some metadata to each token, for example where each token starts and finishes in the original source code. Lastly, parsing (reshaping or restructuring) of the lexed outputs into our [Abstract Syntax Tree (AST)](./ast.md) occurs for later stages of compilation (compiler backend).
10
+
Every language needs a (formal) grammar to describe its syntax and semantics. Once a program adheres to the rules of the grammar in *Source Code* (for example as input string or file format), it is *tokenized* and then *lexer* adds some metadata to each token for example, where each token starts and finishes in the original source code. Lastly, parsing (reshaping or restructuring) of the lexed outputs to [Abstract Syntax Tree](./ast.md).
11
11
12
12
## Grammar
13
13
14
14
While there are varieties of ways to define the grammar, in this book we will use the [Parsing Expression Grammar (PEG)](https://en.wikipedia.org/wiki/Parsing_expression_grammar).
15
15
16
-
Here is how our simple calculator language `Calc` (supporting addition and subtraction) looks like in PEG
16
+
Here is how our simple calculator language `Calc` (supporting addition and subtraction) grammar looks like in PEG
@@ -27,7 +27,7 @@ This grammar basically defines the syntax and semantics where
27
27
* unary or binary expressions are made of `Term` and `Operator` (`"+"` and `"-"`)
28
28
* the only *atom* is integer `Int`
29
29
30
-
Given a PEG grammar, luckily we can use [pest](https://pest.rs/) which is a powerful *parser generator*for the PEG grammars. (For more details on pest, checkout the [pest book](https://pest.rs/book/))
30
+
Given our grammar, we will use [pest](https://pest.rs/) which is a powerful *parser generator*of PEG grammars. (For more details on pest, checkout the [pest book](https://pest.rs/book/))
31
31
32
32
`pest`*derives* the parser `CalcParser::parse` from our grammar
Copy file name to clipboardExpand all lines: book/src/01_calculator/repl.md
+11-5Lines changed: 11 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,13 +1,17 @@
1
1
## Read-Eval-Print Loop (REPL)
2
2
3
-
REPL as its name implies, loops through every line of input and compile it. We use [rustyline crate](https://github.com/kkawakam/rustyline) to create our REPL. We can optionally choose to interpret or JIT each line of input as follow
3
+
REPL (as its name implies) loops through every line of the input and compiles it. We use [rustyline](https://github.com/kkawakam/rustyline) crate to create our REPL. For each line of input, we can optionally choose to
We can either use interpreter, JIT compiler or VM interpreter in our [calculator](../../../calculator) with passing them as flags. Go ahead and run them one by one
14
+
Now, we can use run the REPL and choose different compilation path
11
15
12
16
```
13
17
cargo run --bin repl --features jit
@@ -17,14 +21,16 @@ cargo run --bin repl --features interpreter
17
21
cargo run --bin repl --features vm
18
22
```
19
23
20
-
In either of them, you should see the prompt like
24
+
In any of them, you should see the prompt like
21
25
22
26
```text
23
27
Calculator prompt. Expressions are line evaluated.
24
28
>>>
25
29
```
26
30
27
-
waiting for your inputs. Test it our with `1 + 2` examples and `CTRL-C` with break out of the REPL. You can see the different paths of compilation in debug mode. For example with `--features jit`, you will see
31
+
waiting for your inputs. Here are some sample outputs of different compilation paths in debug mode.
32
+
33
+
* with `--features jit`
28
34
29
35
```text
30
36
Calculator prompt. Expressions are line evaluated.
@@ -50,7 +56,7 @@ entry:
50
56
CTRL-C
51
57
```
52
58
53
-
or with `--features vm`
59
+
* with `--features vm`
54
60
55
61
```text
56
62
Calculator prompt. Expressions are line evaluated.
Copy file name to clipboardExpand all lines: book/src/crash_course.md
+18-18Lines changed: 18 additions & 18 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,21 +6,21 @@ Here is a bird's-eye view of a computer program execution
6
6
</p>
7
7
8
8
9
-
All these three components are intertwined together and learning their connections is crucial in understanding what makes the *Computing* possible. Informally, a *language* is a structured text with syntax and semantics. A *Source Code* written in a programming language needs a translator / compiler of *some sort*, to translate it to *another* language / format. Then an executor of *some sort*, to execute/run the translated commands with the goal of matching the syntax (and semantics) to *some form* of output.
9
+
All these three components are intertwined together and learning their connections is crucial in understanding what makes *Computing* possible. Informally, a *language* is a structured text with syntax and semantics. A *Source Code* written in a programming language needs a translator/compiler of *some sort*, to translate it to *another* language/format. Then an executor of *some sort*, to execute/run the translated commands with the goal of matching the syntax (and semantics) to *some form* of output.
10
10
11
11
## Elements of Computing
12
12
13
13
### Instructions and the Machine Language
14
14
15
15
If you want to create a "computer" from scratch, you need to start by defining an *abstract model* for your computer. This abstract model is also referred to as **Instruction Set Architecture (ISA)** (instruction set or simply *instructions*). A CPU is an *implementation* of such ISA. A standard ISA defines its basic elements such as *data types*, *register* values, various hardware supports, I/O etc. and they all make up the *lowest-level language* of computing which is the **Machine Language Instructions.**
16
16
17
-
Instructions are comprised of *instruction code* (aka *operation code*, in short **opcode** or p-code) which are directly executed by CPU. An opcode can either have operand(s) or no operand. For example, in a 8-bits machine where instructions are 8-bits an opcode *load*is defined by the 4-bits **0011** following by the second 4-bits as operand with **0101**make up an instruction **00110101** in the Machine Language while the opcode for *incrementing by 1* of the previously loaded value could be **1000** with no operand.
17
+
Instructions are comprised of *instruction code* (aka *operation code*, in short **opcode** or p-code) which are directly executed by CPU. An opcode can either have operand(s) or no operand. For example, in an 8-bits machine where instructions are 8-bits an opcode *load*might be defined by the 4-bits **0011** following by the second 4-bits as operand with **0101**that makes up the instruction **00110101** in the Machine Language while the opcode for *incrementing by 1* of the previously loaded value could be defined by**1000** with no operand.
18
18
19
19
Since *opcodes are like atoms of computing*, they are presented in an opcode table. An example of that is [Intel x86 opcode table](http://sparksandflames.com/files/x86InstructionChart.html).
20
20
21
21
### Assembly Language
22
22
23
-
Assembly language is a symbolic version (mnemonics) of the machine language where opcodes consist of symbolic names. From our previous Machine Language example above, **00110101**meaning load the binary **0101**, then in an Assembly language, we can define the symbol **LOAD** referring to 0011 as a higher level abstraction so that 00110101 can be written as **LOAD 0101**.
23
+
Since it's hard to remember the opcodes by their bit-patterns, we can assign *abstract* symbols to opcodes matching their operations by name. This way, we can create Assembly language from the Machine Language. In the previous Machine Language example above, **00110101**(means load the binary **0101**), we can define the symbol **LOAD** referring to **0011** as a higher level abstraction so that **00110101** can be written as **LOAD 0101**.
24
24
25
25
The utility program that translates the Assembly language to Machine Language is called **Assembler**.
26
26
@@ -31,49 +31,49 @@ The utility program that translates the Assembly language to Machine Language is
Compiler is any program that translates (maps, encodes) a language A to language B. Each compiler has two major component
34
+
Compiler is any program that translates (maps, encodes) a language A to language B. Each compiler has two major components
35
35
36
-
***Frontend:** deals with lexer, parser and a structured tree format called **Abstract Syntax Tree (AST)**
36
+
***Frontend:** deals with mapping the source code string to a structured format called **Abstract Syntax Tree (AST)**
37
37
***Backend (code generator):** translates the AST into the [Bytecode](./crash_course.md#bytecode) / [IR](./crash_course.md#intermediate-representation-ir) or Assembly
38
38
39
-
Most often, when we talk about compiler backend, we mean **Ahead-Of-Time (AOT)** compiler where the translation (to Assembly, [Bytecode](./crash_course.md#bytecode) or some [IR](./crash_course.md#intermediate-representation-ir)) happens *before* execution. Another form of translation is **Just-In-Time (JIT)**compiler where translation happens right at the time of the execution.
39
+
Most often, when we talk about compiler, we mean **Ahead-Of-Time (AOT)** compiler where the translation happens *before* execution. Another form of translation is **Just-In-Time (JIT)**compilation where translation happens right at the time of the execution.
40
40
41
-
To distinguish between a program that translates Python to Assembly vs. Python to Java, the former is called compiler and the latter **transpiler**.
41
+
From the diagram above, to distinguish between a program that translates for example, Python to Assembly vs. Python to Java, the former is called compiler and the latter **transpiler**.
42
42
43
-
#### *Relativity of Terms and Definitions*
43
+
#### *Relativity of low-level, high-level*
44
44
45
-
There is a relativity notion in most of terms involved here. Assembly is a *high-level* language comparing to the Machine Language but is considered *low-level* when viewing it from C/C++/Rust. High-level and low-level are relative terms conveying the amount of *abstractions* involved.
45
+
Assembly is a *high-level* language compared to the Machine Language but is considered *low-level* when viewing it from C/C++/Rust. High-level and low-level are relative terms conveying the amount of *abstractions* involved.
46
46
47
47
48
48
### Virtual Machine (VM)
49
49
50
-
Instruction Set Architecture is hardware and vendor specific. That is, an Intel CPU instructions are different from AMD CPU ones. A **(process) VM** abstracts away details of the underlying hardware or operating system so that programs translated/compiled into the VM language to become platform agnostic. A famous example is the **Java Virtual Machine (JVM)**
51
-
which translates/compiles Java programs into JVM language aka Java **Bytecode**. Therefore, if you have a valid Java Bytecode and *Java Runtime Environment (JRE)* in your system, you can execute the Bytecode, regardless on what platform it was compiled.
50
+
[Instructions](./crash_course.md#instructions-and-the-machine-language) are hardware and vendor specific. That is, an Intel CPU instructions are different from AMD CPU. A **VM** abstracts away details of the underlying hardware or operating system so that programs translated/compiled into the VM language becomes platform agnostic. A famous example is the **Java Virtual Machine (JVM)**
51
+
which translates/compiles Java programs to JVM language aka Java **Bytecode**. Therefore, if you have a valid Java Bytecode and *Java Runtime Environment (JRE)* in your system, you can execute the Bytecode, regardless on what platform it was compiled on.
52
52
53
53
#### Bytecode
54
54
55
-
Another technique to translate a Source Code to Machine Code, is emulating the Instruction Set with a new (human friendly) encoding (perhaps easier than assembly). Bytecode is such as (human-readable) *intermediate language / representation* which is lower-level than the actual program language that has been translated from and higher-level that Assembly language.
55
+
Another technique to translate a source code to Machine Code, is emulating the Instruction Set with a new (human friendly) encoding (perhaps easier than assembly). Bytecode is such an *intermediate language/representation* which is lower-level than the actual programming language that has been translated from and higher-level than Assembly language.
56
56
57
57
#### Stack Machine
58
58
59
59
Stack Machine is a simple model for a computing machine with two main components
60
-
* a memory (stack) array keeping the Bytecode instructions that we can `push` and `pop` instructions
61
-
* an instruction pointer (IP) and stack pointer (SP) guiding which instruction was executed and which instruction is next.
60
+
* a memory (stack) array keeping the Bytecode instructions that supports `push`ing and `pop`ing instructions
61
+
* an instruction pointer (IP) and stack pointer (SP) guiding which instruction was executed and what is next.
62
62
63
63
### Intermediate Representation (IR)
64
64
65
-
Any representation that's between Source Code and (usually) Assembly language is considered and intermediate representation. Mainstream languages usually have more one one such representation and go from one IR to another IR is called lowering.
65
+
Any representation that's between source code and (usually) Assembly language is considered an intermediate representation. Mainstream languages usually have more than one such representations and going from one IR to another IR is called *lowering*.
66
66
67
67
### Code Generation
68
68
69
-
Code generation for a compiler is when the compiler *converts an IR to some Machine Code*. But it has a wider semantic too for example when using Rust declarative macros via `macro_rules!` to automate some repetitive implementations, you're essentially generating codes as well as expanding the syntax.
69
+
Code generation for a compiler is when the compiler *converts an IR to some Machine Code*. But it has a wider semantic too for example, when using Rust declarative macro via `macro_rules!` to automate some repetitive implementations, you're essentially generating codes (as well as expanding the syntax).
70
70
71
71
## Conclusion
72
72
73
-
In conclusion, we want to settle one of the most frequently asked questions:
73
+
In conclusion, we want to settle one of the most frequently asked questions
74
74
75
75
## <spanstyle="color:blue">Is Python (or a language X) Compiled or Interpreted?</span>
76
76
77
77
This is in fact the <spanstyle="color:red">WRONG</span> question to ask!
78
78
79
-
Being AOT compiled, JIT compiled or interpreted is **implementation-dependent**. For example, the standard Python is [**CPython**](https://www.python.org/) which compiles a Python source code (in CPython VM) to CPython Bytecode (content of `.pyc`) and **interprets** the Bytecode. However, another implementation of Python is [**PyPy**](https://www.pypy.org/) which (more or less) compiles a Python source code (in PyPy VM) to PyPy Bytecode and **JIT** compiles the PyPy Bytecode to the Machine Code (is usually faster than CPython interpreter).
79
+
Being AOT compiled, JIT compiled or interpreted is **implementation-dependent**. For example, the standard Python *implementation*is [**CPython**](https://www.python.org/) which compiles a Python source code (in CPython VM) to CPython Bytecode (contents of `.pyc`) and **interprets** the Bytecode. However, another implementation of Python is [**PyPy**](https://www.pypy.org/) which (more or less) compiles a Python source code (in PyPy VM) to PyPy Bytecode and **JIT** compiles the PyPy Bytecode to the Machine Code (and is usually faster than CPython interpreter).
0 commit comments