I've been learning ANTLR, by writing my own (extremely simple!) programming language
It made me curious about how the lexer/parser/AST is implemented for java. Obviously there is a grammar for .java
files and this gets parsed down to ByteCode, which is then validated to check that all is syntactically and sementically correct and then finally a .class
file is written out, but I wondered what happens when you run
java -cp '.' Foo
I assume Foo.class
has to be read in - but does this (the bytecode in the .class
file) then have a grammar of it's own? or is the AST that is formed when the .java
file was compiled just written to disk in binary form, and then just slurped into memory when the java.exe
program runs
I was thinking that it probably does have an AST, as this is probably the essence of what a JIT compiler really is.
-
Generally speaking, the purpose of compiling to bytecode is that a bytecode format of simple opcode-argument tuples can be easily executed by computers without all that expensive lexing/parsing/AST nonsense that our human-readable languages require. But I'm not familiar enough with the JVM to give concrete examples, so leaving this as a comment.Ixrec– Ixrec2015年06月06日 12:37:03 +00:00Commented Jun 6, 2015 at 12:37
-
3Yes. You are looking for chapter 4 of the Java Virtual Machine Specification: The class File Format.user40980– user409802015年06月06日 12:37:36 +00:00Commented Jun 6, 2015 at 12:37
1 Answer 1
Technically speaking, there is a parser, but practically speaking, the parser is so trivial, that you typically don't call it a "parser". After all, the JVM bytecode format is specifically designed for being fast and easy to read by a machine and not at all to be readable by humans.
It's basically some fixed-size fields of fixed types and some variable-length arrays of fixed types whose length is provided in a fixed-size field of a fixed type at a fixed location.