For a C project, I'm upgrading my build process from
- MingGW-GCC / make and Android Studio under Windows (2 separate processes)
to
- Clang / CMake under Debian 8, using wclang and wine to compile and run the Windows build
It seems to me as if we build for one platform at a time. I suppose there is a large part of the process of compiling and linking which is platform-specific; that cannot be avoided given the enormous differences in platform. I would expect that to be the majority of the compiler's work.
But I wonder; does all cross-compilation work this way? Say we're compiling for 3 platforms: A, B, C.
Let's say 25% of the work involved in compiling for A can be reused in compiling for B & C (things like building the AST). Surely we would want to do that, thus reducing overall build time?
Are there such tools (particularly in relation to Clang, GCC)? And is there a name for such a "sharing" cross-compile, that I should know about? Thanks.
1 Answer 1
In a different language, yes, you could perhaps reuse an AST. However, the compilation model of C makes this impossible. First, a preprocessor phase runs over the code. Every compiler has its own #define
s so that we can safely include compiler-specific options. Also, various defines allow us to check what system we are compiling for. In the code base I am working on, we use conditional compilation to select better system calls on operating systems which support them. Only after all these defines are resolved, do we get a C source that is parsed and converted to an AST. Since this AST now contains platform-specific assumptions, you can't use it as the basis for cross-platform compilation.
A related problem is that different systems and compilers may use different implementations for the C standard library. There is considerable freedom how some parts can be implemented, e.g. as functions or as macros. The point is that you cannot just use the headers for a wrong libc and expect everything to work.
A note on LLVM IR: while the IR is cross-platform in the sense that it can be compiled to machine code on any supported architecture, it already contains platform-specific assumptions such as integer widths, data layout, calling conventions, .... The IR is only useful when depugging, or when working on the compiler toolchain since it works as a data exchange format between various compiler phases (in particular, between optimization passes).
While this is a theoretical exercise, I also have to point out that you are expecting too much benefit from avoiding parsing. Modern compilers spend fairly little time parsing, and a lot of time optimizing. In the C compilation model, I/O time may also be relevant since you're often reading the same header files for each object file you're generating. Many compilers offer a pre-compiled header mechanism where you can create an AST for the first header included by a source file. This can be helpful if you include the same set of headers in every file, but it does not help with cross-compilation.
A more viable approach to cutting down compilation times is to slim down your headers: include as few other headers as possible. Prefer predeclarations over including another header. Divide your code into clear layers: each layer may only include headers from the lower layers – this avoids accidentally including nearly all other headers. Prefer small headers that only declare a couple of related symbols rather than using large headers where most declarations will go unused by most including files. Benchmark whether any suggestions even provide a measurable difference rather than blindly copying what some n00b wrote on the internet.
#DEFINE
s, constants, type sizes, etc would cause complications).-emit-llvm
flag to compile to IR (Intermediate Representation). See this. Cheers for the line of thinking.