8

In this and this stack overflow questions, the answers state that C as a compiler backend is a bad idea.

But why?

C has many compilers that can heavily optimize it. Every platform has a compiler that supports it, and it can be compiled to every architecture in existence. In addition, languages like Nim and V support generating C code.

So, I don't understand why C would be a bad idea at all. In my view, it seems a rather good choice.

asked Feb 5, 2020 at 22:21
2
  • 1
    Good compilers rely on being fed reasonably canonical code, and the C code generated by a transpiler is unlikely to look like that. Hence the objections about performance (probably). Commented Feb 5, 2020 at 23:04
  • 3
    In this space, you're going to find two kinds of people, the ones that have an opinion because they think they know, and the ones that actually know because they've tried it (I'm in the former category). According to the post you linked, the ones who object to C as a backed have tried and failed, and ultimately settled on emitting processor instructions or byte code. The LLVM project apparently went through this life-cycle, as they used to have a C backend that isn't supported or maintained anymore. Commented Feb 5, 2020 at 23:05

3 Answers 3

10

Define "good" and "bad" backend

According to what criteria do you evaluate whether it is a good or bad solution? Without knowing, we are more in subjective beliefs rather than objective advise:

  • Your arguments make the C compiler an attractive alternative for developing a portable working solution very fast. Example: Bjarne Stroustrup invented C++ and started with a C Frontend that proved to be a good solution for around 10 years.
  • Going via C slows down the process: you take large code as input, write even larger code as output, let the C compiler process again this larger source, etc... To continue with the C++ example: once you discovered an ultra-fast Zortech C++ compiler, the slow CFront quickly appeared as not so good anymore.
  • Finally it also depends on the language proximity. Some constructs are not easily expressed in C and need a lot of boiler-plate code. It's like natural language translation: if you translate Japanese into German directly the result may very well be more accurate than translating from Japanese to English and English into German, because at every translation you risk to loose some precision.

So, no universal good or bad:

  • If you just need to write some report generator or a mathematical simulation engine, the drawback of the intermediary C is negligible in comparison with the benefits.
  • But if you're in more serious language design, you'd better go for a more robust solution. Fortunately, you no longer have to hand-craft assembler generation: between Java's bytecode, Python's bytecode, CIL and other virtual machines available, you can chose how to best reuse proven performant compilers and JIT compilers.
Doc Brown
219k35 gold badges406 silver badges620 bronze badges
answered Feb 6, 2020 at 0:14
8

Every platform has a compiler that supports it, and it can be compiled to every architecture in existence.

But they don't all behave the same. As a compiler writer, I don't want to have to depend on (or make) a configure script to figure out if my ints are actually 4 bytes or not. I don't want to chase down weird bugs on some esoteric platform because I depended on their C compiler's implementation of some undefined behavior. And I really don't want to inspect some dump of some C code that's been optimized by God-knows-what.

LLVM and other common backend targets are a little harder to work with when writing code, but they're tons easier to work with once the code is written because they're very unambiguous about how they behave and (generally) have dedicated tools for debugging the kind of catastrophes that compiler devs manage to create.

answered Feb 5, 2020 at 23:21
3
  • "But they all don't behave the same"... If the language's parser checks for syntax errors, parser errors, and properly warned for potential undefined behavior before code generation, then inspecting the outputted C wouldn't be necessary... Commented Feb 6, 2020 at 1:07
  • 1
    @KiedLlaentenn - I was perhaps unclear then? The same exact C code does not behave the same on every architecture in existence. Reasonable C code, like adding ints together. Commented Feb 6, 2020 at 2:01
  • ah. I misunderstood. Commented Feb 6, 2020 at 2:18
2

The reason C is not a good backend for a compiler is simple:

Any translation is imperfect.

Thus, unless your source-language can be mapped perfectly onto the target language, thus is a close analogue to C, the translation adds spurious constraints, is convoluted, or is wrong, if not all of them.

Target languages for compiler-writers are generally designed to allow fairly clean and efficient mapping of the source language(s), or are so low-level they are very easily translated to the target machine.

C has its own semantics and was created for writing in it directly, not for anything else, and it shows in that niche.

answered Feb 5, 2020 at 23:03
2
  • I envision a rudimentary (if naïve) transpiler as one that simply maps source instructions to methods in a C library. There's no reason why this couldn't work. Whether it works well or not is another story. Commented Feb 5, 2020 at 23:08
  • @RobertHarvey: I think you're describing a bytecode VM/interpreter. Commented Feb 7, 2020 at 3:49

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.