Overview • Pipeline • Usage • Features • Docs
A complete multi-stage compiler built from scratch in Python. Transforms source code through lexical analysis, syntactic parsing (LL(1)), semantic analysis, intermediate representation generation, and execution on a custom stack machine.
📚 Educational project demonstrating the full compilation pipeline — from raw text to executable instructions.
Source Code
│
▼
┌─────────────────────┐
│ 1 Lexer (Scanner) │ Tokenizes source → [Token, Token, ...]
└─────────┬───────────┘
▼
┌─────────────────────┐
│ 2 Parser (LL(1)) │ Builds AST from token stream
└─────────┬───────────┘
▼
┌─────────────────────┐
│ 3 Semantic │ Type checking, scope resolution,
│ Analyzer │ symbol table validation
└─────────┬───────────┘
▼
┌─────────────────────┐
│ 4 IR Generator │ Intermediate Representation (3-address code)
└─────────┬───────────┘
▼
┌─────────────────────┐
│ 5 Stack Machine │ Executes IR on custom virtual machine
└─────────────────────┘
│
▼
Output
# Run a .gox source file python main.py examples/mandelplot.gox # Or use individual stages python Lexer.py source.gox # Stage 1: Tokenization python Parser.py source.gox # Stage 2: AST Generation python SemanticAnalyzer.py ... # Stage 3: Semantic Check python IRCodeGenerator.py ... # Stage 4: IR Generation python StackMachine.py ... # Stage 5: Execution
// Variable declarations
var x = 10;
var name = "hello";
// Control flow
if (x >= 5) {
print(x);
} else {
print("too small");
}
// Loops
while (x > 0) {
x = x - 1;
}
// Functions
function fibonacci(n) {
if (n <= 1) return n;
return fibonacci(n - 1) + fibonacci(n - 2);
}
| Stage | Implementation | Status |
|---|---|---|
| Lexer | Regex-based tokenizer, keyword recognition, error handling | ✅ Complete |
| Parser | LL(1) recursive descent, AST node construction | ✅ Complete |
| Semantic Analyzer | Symbol table, type checking, scope management | ✅ Complete |
| IR Generator | Three-address code, intermediate representation | ✅ Complete |
| Stack Machine | Custom VM, instruction execution, memory management | ✅ Complete |
| Optimizer | Constant folding, dead code elimination | ✅ Complete |
| Category | Examples |
|---|---|
| Keywords | var, if, else, while, function, return, print |
| Operators | +, -, *, /, <=, >=, ==, !=, = |
| Literals | Integers, floats, strings, booleans |
| Comments | Single-line (//) and multi-line (/* */) |
python-compiler/
├── Lexer.py # Stage 1: Tokenization
├── Parser.py # Stage 2: AST Construction
├── SemanticAnalyzer.py # Stage 3: Type & Scope Checking
├── IRCodeGenerator.py # Stage 4: Intermediate Representation
├── StackMachine.py # Stage 5: Virtual Machine Execution
├── Nodes_AST.py # AST node definitions
├── SymbolTable.py # Symbol table implementation
├── Error.py # Error handling framework
├── Types.py # Type system definitions
├── main.py # Entry point
├── examples/
│ └── mandelplot.gox # Example source: Mandelbrot plotter
└── Documentation/ # Full documentation site
├── index.html
├── Markdown/ # Detailed docs for each stage
└── src/ # Documented source code
A full documentation site is available in the Documentation/ directory:
- Lexer: Token specification, regex patterns, error handling
- Parser: Grammar rules, AST node types, LL(1) parsing
- Semantic Analyzer: Symbol table, type checking, scope rules
- IR Generator: Three-address code, instruction set
- Stack Machine: VM architecture, instruction execution
Open Documentation/index.html in your browser to explore.
The compiler evolved through 4 major iterations:
| Version | Improvements |
|---|---|
| v1 | Basic lexer + parser, simple expression evaluation |
| v2 | Full AST, error handling, symbol table |
| v3 | Semantic analysis, scope management, type system |
| v4 (Final) | IR generation, stack machine, optimizer, doc site |
MIT — See LICENSE for details.
Academic project — Compilers course, UTP Colombia