Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit 3b5b3d3

Browse files
author
Ehsan M. Kermani
committed
Content editorial
1 parent 7568e1c commit 3b5b3d3

File tree

10 files changed

+72
-38
lines changed

10 files changed

+72
-38
lines changed

‎book/src/01_calculator/ast.md

Lines changed: 16 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@ Now, we can use the `pest` generated `CalcParser::parse` to map the Rules of our
6464
6565
{{#include ../../../calculator/src/parser.rs:parse_source}}
6666
```
67-
Checkout [calculator/src/parser.rs](../../../calculator/src/parser.rs).
67+
Checkout [calculator/src/parser.rs](https://github.com/ehsanmok/create-your-own-lang-with-rust/blob/master/calculator/src/parser.rs).
6868

6969

7070
Note that `CalcParser::parse` takes care of the AST traversal and correctly maps it to `Vec<Node>` for easier access
@@ -73,21 +73,33 @@ in later stages of compilation.
7373

7474
## Interpreter
7575

76-
CPU is the *ultimate interpreter*. That is, it executes opcodes as it goes. To do that, after we have changed the representation (aka *lowered* the representation) of our source code `&str` to AST `Node`, a basic interpreter looks and each node of the AST (via any [tree traversal methods](https://en.wikipedia.org/wiki/Tree_traversal)) and simply **evaluates** it *recursively*
76+
CPU is the *ultimate interpreter*. That is, it executes opcodes as it goes. To do that, after we have changed the representation (aka *lowered* the representation) of our source code `&str` to AST `Node`, a basic interpreter looks at each node of the AST (via any [tree traversal methods](https://en.wikipedia.org/wiki/Tree_traversal)) and simply **evaluates** it *recursively*
7777

7878
```rust,ignore
7979
{{#include ../../../calculator/src/compiler/interpreter.rs:interpreter_eval}}
8080
```
8181

82-
To sum up, we define a `Compile` trait
82+
To sum up, we define a `Compile` trait that we will use throughout this chapter
8383

8484
```rust,ignore
8585
{{#include ../../../calculator/src/lib.rs:compile_trait}}
8686
```
8787

88-
and implement our interpreter
88+
and we can now implement our interpreter
8989

9090
```rust,ignore
9191
{{#include ../../../calculator/src/compiler/interpreter.rs:interpreter}}
9292
```
9393
<span class="filename">Filename: calculator/src/compiler/interpreter.rs</span>
94+
95+
and test
96+
97+
```rust,ignore
98+
assert_eq!(Interpreter::from_source("1 + 2").unwrap(), 3);
99+
```
100+
101+
Run such tests locally with
102+
103+
```text
104+
cargo test interpreter --tests
105+
```

‎book/src/01_calculator/ast_traversal.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,3 +32,9 @@ Finally, we can test it
3232
```rust,ignore
3333
assert_eq!(Jit::from_source("1 + 2").unwrap(), 3)
3434
```
35+
36+
Run such tests locally with
37+
38+
```text
39+
cargo test jit --tests
40+
```

‎book/src/01_calculator/basic_llvm.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ We want to define an add function like
2020
add(x: i32, x: i32) -> i32 { x + y }
2121
```
2222

23-
but using the **LLVM language** and JIT it. Since LLVM is also a VM, it has its own Bytecodes and IR. The point is we need to define *every* bit of what makes up a function through LLVM basic constructs such as context, module, function signature setups, argument types, basic block, etc.
23+
but using the **LLVM language** and JIT it. For that, we need to define *every* bit of what makes up a function through LLVM basic constructs such as context, module, function signature setups, argument types, basic block, etc.
2424

2525
Here is how to *stitch* our add function in LLVM
2626

@@ -37,7 +37,7 @@ Here is how to *stitch* our add function in LLVM
3737
{{#include ../../../calculator/examples/llvm/src/main.rs:second}}
3838
```
3939

40-
3. We create the arguments `x` and `y` and add them to the `builder` and make up the return instruction
40+
3. We create the arguments `x` and `y` and add them to the `builder` to make up the return instruction
4141

4242
```rust,ignore
4343
{{#include ../../../calculator/examples/llvm/src/main.rs:third}}

‎book/src/01_calculator/calc_intro.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ If you haven't cloned the [GitHub](https://github.com/ehsanmok/create-your-own-l
77
To start, we have `1 + 1;` in [examples/simple.calc](https://github.com/ehsanmok/create-your-own-lang-with-rust/blob/master/calculator/examples/simple.calc) where you can compile with
88

99
```text
10-
cargo build --bin main // create the CLI executable for Calc
10+
cargo build --bin main // create a simple executable for Calc
1111
../target/debug/main examples/simple.calc
1212
```
1313

‎book/src/01_calculator/exercise.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
## Exercise
22

3+
To get most out of this chapter, it is recommended to at least try the first exercise below
4+
35
1. Add support for multiplication and division to the calculator and allow computations on floating numbers `f32`. Can you include standard operator precedence?
46
2. JIT with [cranelift-simplejit](https://docs.rs/cranelift-simplejit/0.64.0/cranelift_simplejit/)
57
3. JIT with [gcc-jit](http://swgillespie.me/gccjit.rs/gccjit/)

‎book/src/01_calculator/grammar_lexer_parser.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,13 +7,13 @@ Here is a high-level view of a compiler *frontend* pipeline
77
<a href><img alt="grammar, lexer, parser" src="../img/grammar_lexer_parser.svg"> </a>
88
</p>
99

10-
Every language needs a (formal) grammar to describe its syntax and semantics. Once a program adheres to the rules of the grammar in *Source Code* (for example as input string or file format), it is *tokenized* and then *lexer* adds some metadata to each token, for example where each token starts and finishes in the original source code. Lastly, parsing (reshaping or restructuring) of the lexed outputs into our [Abstract Syntax Tree (AST)](./ast.md) occurs for later stages of compilation (compiler backend).
10+
Every language needs a (formal) grammar to describe its syntax and semantics. Once a program adheres to the rules of the grammar in *Source Code* (for example as input string or file format), it is *tokenized* and then *lexer* adds some metadata to each token for example, where each token starts and finishes in the original source code. Lastly, parsing (reshaping or restructuring) of the lexed outputs to [Abstract Syntax Tree](./ast.md).
1111

1212
## Grammar
1313

1414
While there are varieties of ways to define the grammar, in this book we will use the [Parsing Expression Grammar (PEG)](https://en.wikipedia.org/wiki/Parsing_expression_grammar).
1515

16-
Here is how our simple calculator language `Calc` (supporting addition and subtraction) looks like in PEG
16+
Here is how our simple calculator language `Calc` (supporting addition and subtraction) grammar looks like in PEG
1717

1818
```text
1919
{{ #include ../../../calculator/src/grammar.pest }}
@@ -27,7 +27,7 @@ This grammar basically defines the syntax and semantics where
2727
* unary or binary expressions are made of `Term` and `Operator` (`"+"` and `"-"`)
2828
* the only *atom* is integer `Int`
2929

30-
Given a PEG grammar, luckily we can use [pest](https://pest.rs/) which is a powerful *parser generator* for the PEG grammars. (For more details on pest, checkout the [pest book](https://pest.rs/book/))
30+
Given our grammar, we will use [pest](https://pest.rs/) which is a powerful *parser generator* of PEG grammars. (For more details on pest, checkout the [pest book](https://pest.rs/book/))
3131

3232
`pest` *derives* the parser `CalcParser::parse` from our grammar
3333

‎book/src/01_calculator/repl.md

Lines changed: 11 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,17 @@
11
## Read-Eval-Print Loop (REPL)
22

3-
REPL as its name implies, loops through every line of input and compile it. We use [rustyline crate](https://github.com/kkawakam/rustyline) to create our REPL. We can optionally choose to interpret or JIT each line of input as follow
3+
REPL (as its name implies) loops through every line of the input and compiles it. We use [rustyline](https://github.com/kkawakam/rustyline) crate to create our REPL. For each line of input, we can optionally choose to
4+
5+
* directly interpret the AST
6+
* JIT the AST
7+
* compile to our bytecode VM and interpret it
48

59
```rust,no_run,noplaypen
610
{{#include ../../../calculator/src/bin/repl.rs:repl}}
711
```
812
<span class="filename">Filename: calculator/src/bin/repl.rs</span>
913

10-
We can either use interpreter, JIT compiler or VM interpreter in our [calculator](../../../calculator) with passing them as flags. Go ahead and run them one by one
14+
Now, we can use run the REPL and choose different compilation path
1115

1216
```
1317
cargo run --bin repl --features jit
@@ -17,14 +21,16 @@ cargo run --bin repl --features interpreter
1721
cargo run --bin repl --features vm
1822
```
1923

20-
In either of them, you should see the prompt like
24+
In any of them, you should see the prompt like
2125

2226
```text
2327
Calculator prompt. Expressions are line evaluated.
2428
>>>
2529
```
2630

27-
waiting for your inputs. Test it our with `1 + 2` examples and `CTRL-C` with break out of the REPL. You can see the different paths of compilation in debug mode. For example with `--features jit`, you will see
31+
waiting for your inputs. Here are some sample outputs of different compilation paths in debug mode.
32+
33+
* with `--features jit`
2834

2935
```text
3036
Calculator prompt. Expressions are line evaluated.
@@ -50,7 +56,7 @@ entry:
5056
CTRL-C
5157
```
5258

53-
or with `--features vm`
59+
* with `--features vm`
5460

5561
```text
5662
Calculator prompt. Expressions are line evaluated.

‎book/src/01_calculator/vm.md

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -101,5 +101,10 @@ let mut vm = VM::new(byte_code);
101101
vm.run();
102102
println!("{}", vm.pop_last());
103103
```
104+
Run tests locally for our VM with
104105

105-
Checkout the [next section](./repl.md) on how to create a REPL for our `Calc` and see some samples of computations.
106+
```text
107+
cargo test vm --tests
108+
```
109+
110+
Checkout the [next section](./repl.md) on how to create a REPL for our `Calc` to compare different compilation paths.

‎book/src/crash_course.md

Lines changed: 18 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -6,21 +6,21 @@ Here is a bird's-eye view of a computer program execution
66
</p>
77

88

9-
All these three components are intertwined together and learning their connections is crucial in understanding what makes the *Computing* possible. Informally, a *language* is a structured text with syntax and semantics. A *Source Code* written in a programming language needs a translator / compiler of *some sort*, to translate it to *another* language / format. Then an executor of *some sort*, to execute/run the translated commands with the goal of matching the syntax (and semantics) to *some form* of output.
9+
All these three components are intertwined together and learning their connections is crucial in understanding what makes *Computing* possible. Informally, a *language* is a structured text with syntax and semantics. A *Source Code* written in a programming language needs a translator/compiler of *some sort*, to translate it to *another* language/format. Then an executor of *some sort*, to execute/run the translated commands with the goal of matching the syntax (and semantics) to *some form* of output.
1010

1111
## Elements of Computing
1212

1313
### Instructions and the Machine Language
1414

1515
If you want to create a "computer" from scratch, you need to start by defining an *abstract model* for your computer. This abstract model is also referred to as **Instruction Set Architecture (ISA)** (instruction set or simply *instructions*). A CPU is an *implementation* of such ISA. A standard ISA defines its basic elements such as *data types*, *register* values, various hardware supports, I/O etc. and they all make up the *lowest-level language* of computing which is the **Machine Language Instructions.**
1616

17-
Instructions are comprised of *instruction code* (aka *operation code*, in short **opcode** or p-code) which are directly executed by CPU. An opcode can either have operand(s) or no operand. For example, in a 8-bits machine where instructions are 8-bits an opcode *load* is defined by the 4-bits **0011** following by the second 4-bits as operand with **0101** make up an instruction **00110101** in the Machine Language while the opcode for *incrementing by 1* of the previously loaded value could be **1000** with no operand.
17+
Instructions are comprised of *instruction code* (aka *operation code*, in short **opcode** or p-code) which are directly executed by CPU. An opcode can either have operand(s) or no operand. For example, in an 8-bits machine where instructions are 8-bits an opcode *load* might be defined by the 4-bits **0011** following by the second 4-bits as operand with **0101** that makes up the instruction **00110101** in the Machine Language while the opcode for *incrementing by 1* of the previously loaded value could be defined by **1000** with no operand.
1818

1919
Since *opcodes are like atoms of computing*, they are presented in an opcode table. An example of that is [Intel x86 opcode table](http://sparksandflames.com/files/x86InstructionChart.html).
2020

2121
### Assembly Language
2222

23-
Assembly language is a symbolic version (mnemonics) of the machine language where opcodes consist of symbolic names. From our previous Machine Language example above, **00110101** meaning load the binary **0101**, then in an Assembly language, we can define the symbol **LOAD** referring to 0011 as a higher level abstraction so that 00110101 can be written as **LOAD 0101**.
23+
Since it's hard to remember the opcodes by their bit-patterns, we can assign *abstract* symbols to opcodes matching their operations by name. This way, we can create Assembly language from the Machine Language. In the previous Machine Language example above, **00110101** (means load the binary **0101**), we can define the symbol **LOAD** referring to **0011** as a higher level abstraction so that **00110101** can be written as **LOAD 0101**.
2424

2525
The utility program that translates the Assembly language to Machine Language is called **Assembler**.
2626

@@ -31,49 +31,49 @@ The utility program that translates the Assembly language to Machine Language is
3131
<a href><img alt="compiler" src="./img/compiler.svg"> </a>
3232
</p>
3333

34-
Compiler is any program that translates (maps, encodes) a language A to language B. Each compiler has two major component
34+
Compiler is any program that translates (maps, encodes) a language A to language B. Each compiler has two major components
3535

36-
* **Frontend:** deals with lexer, parser and a structured tree format called **Abstract Syntax Tree (AST)**
36+
* **Frontend:** deals with mapping the source code string to a structured format called **Abstract Syntax Tree (AST)**
3737
* **Backend (code generator):** translates the AST into the [Bytecode](./crash_course.md#bytecode) / [IR](./crash_course.md#intermediate-representation-ir) or Assembly
3838

39-
Most often, when we talk about compiler backend, we mean **Ahead-Of-Time (AOT)** compiler where the translation (to Assembly, [Bytecode](./crash_course.md#bytecode) or some [IR](./crash_course.md#intermediate-representation-ir)) happens *before* execution. Another form of translation is **Just-In-Time (JIT)** compiler where translation happens right at the time of the execution.
39+
Most often, when we talk about compiler, we mean **Ahead-Of-Time (AOT)** compiler where the translation happens *before* execution. Another form of translation is **Just-In-Time (JIT)** compilation where translation happens right at the time of the execution.
4040

41-
To distinguish between a program that translates Python to Assembly vs. Python to Java, the former is called compiler and the latter **transpiler**.
41+
From the diagram above, to distinguish between a program that translates for example, Python to Assembly vs. Python to Java, the former is called compiler and the latter **transpiler**.
4242

43-
#### *Relativity of Terms and Definitions*
43+
#### *Relativity of low-level, high-level*
4444

45-
There is a relativity notion in most of terms involved here. Assembly is a *high-level* language comparing to the Machine Language but is considered *low-level* when viewing it from C/C++/Rust. High-level and low-level are relative terms conveying the amount of *abstractions* involved.
45+
Assembly is a *high-level* language compared to the Machine Language but is considered *low-level* when viewing it from C/C++/Rust. High-level and low-level are relative terms conveying the amount of *abstractions* involved.
4646

4747

4848
### Virtual Machine (VM)
4949

50-
Instruction Set Architecture is hardware and vendor specific. That is, an Intel CPU instructions are different from AMD CPU ones. A **(process) VM** abstracts away details of the underlying hardware or operating system so that programs translated/compiled into the VM language to become platform agnostic. A famous example is the **Java Virtual Machine (JVM)**
51-
which translates/compiles Java programs into JVM language aka Java **Bytecode**. Therefore, if you have a valid Java Bytecode and *Java Runtime Environment (JRE)* in your system, you can execute the Bytecode, regardless on what platform it was compiled.
50+
[Instructions](./crash_course.md#instructions-and-the-machine-language) are hardware and vendor specific. That is, an Intel CPU instructions are different from AMD CPU. A **VM** abstracts away details of the underlying hardware or operating system so that programs translated/compiled into the VM language becomes platform agnostic. A famous example is the **Java Virtual Machine (JVM)**
51+
which translates/compiles Java programs to JVM language aka Java **Bytecode**. Therefore, if you have a valid Java Bytecode and *Java Runtime Environment (JRE)* in your system, you can execute the Bytecode, regardless on what platform it was compiled on.
5252

5353
#### Bytecode
5454

55-
Another technique to translate a Source Code to Machine Code, is emulating the Instruction Set with a new (human friendly) encoding (perhaps easier than assembly). Bytecode is such as (human-readable) *intermediate language / representation* which is lower-level than the actual program language that has been translated from and higher-level that Assembly language.
55+
Another technique to translate a source code to Machine Code, is emulating the Instruction Set with a new (human friendly) encoding (perhaps easier than assembly). Bytecode is such an *intermediate language/representation* which is lower-level than the actual programming language that has been translated from and higher-level than Assembly language.
5656

5757
#### Stack Machine
5858

5959
Stack Machine is a simple model for a computing machine with two main components
60-
* a memory (stack) array keeping the Bytecode instructions that we can `push` and `pop` instructions
61-
* an instruction pointer (IP) and stack pointer (SP) guiding which instruction was executed and which instruction is next.
60+
* a memory (stack) array keeping the Bytecode instructions that supports `push`ing and `pop`ing instructions
61+
* an instruction pointer (IP) and stack pointer (SP) guiding which instruction was executed and what is next.
6262

6363
### Intermediate Representation (IR)
6464

65-
Any representation that's between Source Code and (usually) Assembly language is considered and intermediate representation. Mainstream languages usually have more one one such representation and go from one IR to another IR is called lowering.
65+
Any representation that's between source code and (usually) Assembly language is considered an intermediate representation. Mainstream languages usually have more than one such representations and going from one IR to another IR is called *lowering*.
6666

6767
### Code Generation
6868

69-
Code generation for a compiler is when the compiler *converts an IR to some Machine Code*. But it has a wider semantic too for example when using Rust declarative macros via `macro_rules!` to automate some repetitive implementations, you're essentially generating codes as well as expanding the syntax.
69+
Code generation for a compiler is when the compiler *converts an IR to some Machine Code*. But it has a wider semantic too for example, when using Rust declarative macro via `macro_rules!` to automate some repetitive implementations, you're essentially generating codes (as well as expanding the syntax).
7070

7171
## Conclusion
7272

73-
In conclusion, we want to settle one of the most frequently asked questions:
73+
In conclusion, we want to settle one of the most frequently asked questions
7474

7575
## <span style="color:blue">Is Python (or a language X) Compiled or Interpreted?</span>
7676

7777
This is in fact the <span style="color:red">WRONG</span> question to ask!
7878

79-
Being AOT compiled, JIT compiled or interpreted is **implementation-dependent**. For example, the standard Python is [**CPython**](https://www.python.org/) which compiles a Python source code (in CPython VM) to CPython Bytecode (content of `.pyc`) and **interprets** the Bytecode. However, another implementation of Python is [**PyPy**](https://www.pypy.org/) which (more or less) compiles a Python source code (in PyPy VM) to PyPy Bytecode and **JIT** compiles the PyPy Bytecode to the Machine Code (is usually faster than CPython interpreter).
79+
Being AOT compiled, JIT compiled or interpreted is **implementation-dependent**. For example, the standard Python *implementation*is [**CPython**](https://www.python.org/) which compiles a Python source code (in CPython VM) to CPython Bytecode (contents of `.pyc`) and **interprets** the Bytecode. However, another implementation of Python is [**PyPy**](https://www.pypy.org/) which (more or less) compiles a Python source code (in PyPy VM) to PyPy Bytecode and **JIT** compiles the PyPy Bytecode to the Machine Code (and is usually faster than CPython interpreter).

0 commit comments

Comments
(0)

AltStyle によって変換されたページ (->オリジナル) /