Why Is SQLite Coded In C

We are nearing the day someone quips that C is an improvement on most of its successors (smirk). So reading this page from the SQLite website is instructive, as is reading the page on the tooling and coding practices that make this approach work.

I think none of this is news, and these approaches have been on the books for quite a bit. But still, as I said: an improvement on most of its successors. Hat tip: HN discussion.

By Ehud Lamm at 2018年03月16日 04:47 | General | other blogs | 46560 reads

Low dependency

Just reminiscing. There was this guy on my college dorm floor who was really in to airplanes. Maybe a few years before this, the USAF had (iirc) first publicly admitted to experimenting with fighter jets with forward-swept wings, vehicles so unstable that flying them was beyond human reflexes and a computer control system was needed for them to fly at all; very high-tech stuff, very impressive. He had a poster on his dorm room wall of one of those forward-swept-wing fighters, just the sort of thing he would have a poster of. One day in the dorm common room we were watching some miscellaneous fluff on TV (I think it was Airwolf), and the episode that was on involved a crop-dusting triplane. And this guy with a poster of a super-high-tech fighter on his wall remarked reverentially, "You can fly one of those at thirty miles an hour without stalling." [The speed he named may actually have been lower than that; what I've remembered vividly all these years is the referential tone in which it was said.]

By John Shutt at Fri, 2018年03月16日 11:56 | login or register to post comments

Yeah, that was kind of my

Yeah, that was kind of my point...

By Ehud Lamm at Fri, 2018年03月16日 13:23 | login or register to post comments

I think the standards

I think the standards process for C has a lot to with it. It's not perfect, and there are ambiguities around some (important) edge cases, but it's a sterling effort by many fine minds.

It'll be a long time before SQLite could be ported and reach the same level of reliability and predictability.

By davidb at Fri, 2025年09月05日 08:54 | login or register to post comments

C code is always fixable. Other languages not so much.

Duplicate post edited for brevity. Please ignore.

By Ray Dillinger at Mon, 2025年09月08日 16:56 | login or register to post comments

C code is always fixable. Other languages not so much.

The thing I note about C code is that if something goes wrong (and it does, a lot) there is always a way to debug the code. You can track down the bug, and whatever it happens to be this time, you can always fix it. The basic libraries leave a lot of things undone, but you can choose to do them any way you need them done for a particular use. And the language definition and the library definitions are remarkably stable.

Over time, the bugs in C code can get fixed, and when the effort is put in to do it right, and check the state of the relevant ambiguities and differences and definitions and conditions and handle each case, the bugs STAY fixed.

Legacy code that's been ported from an ancient VAX through seven generations of other systems with different Operating Systems from different vendors with different word lengths and endianness and and I/O primitives and distinctions between character and block devices, and compilers from five different sources, etc, may be a massive hairball but all that hair works out and checks all those corner cases and differences, and good fixes continue to work those corner cases and differences out through all future versions of C. Which means legacy code eventually becomes as close to bulletproof as anything has ever been.

With most languages this isn't the case. Small bugs that affect rare corner cases get baked into the extended runtime, or the memory management functions, or the interrupt handling, or the I/O libraries, or whatever else. Design decisions not compatible with at least a few of the things you want to do are made fundamental and mandatory and no other way to do those things is available. Language definitions changes between versions 4 and 5 invalidating all code written in versions 1 and 2 and there's no way to check what version of the interpreter it's using. Furthermore there's often no way to achieve interaction with the underlying system in a way that's design-compatible with the earlier versions. Libraries with OO implementations get "ripple effect" bugs when the OO systems they're implemented in get reimplemented upstream. And the compiler or interpreter tends to be available from exactly one source. If that's the case, and that source decides not to continue, the language dies.

So either the languages themselves are too limited (usually for the sake of "safety" or "simplicity"), or they contain lexical and semantic ambiguities you can't check for and bugs due to confusion or ambiguity about implied and undeclared object types with different overloaded operations or operands (usually for the sake of "ease of use" or "brevity").

It's all well and good to have some definition of "Type safety" but if you achieve overloaded functions that are truly invisible in application you invite confusion on the programmer's part while proving that there is never confusion on the compiler's part. It doesn't matter if the types are "safe" if the syntax leads someone to believe the semantics that apply to them are those of a different type. Especially if scope or overloading rules change, no matter how subtly, between versions.
the code you write in those languages is unstable and will eventually break with no reliable way to diagnose, check for the conditions relevant to the bug, and permanently fix it.

By Ray Dillinger at Mon, 2025年09月08日 17:01 | login or register to post comments

Consequence of long tenure

I think this virtue is not a consequence of particularly good design, but of a long period of heavy use. C is polished, but from a long time in the tumbler. All the fragile corners have been knocked off.

C's current stability might not have been a foregone conclusion, but decades as the de facto foundation of systems programming is the obvious culprit for its long-term backwards compatibility.

By Bruce J. Bell at Sat, 2025年09月27日 11:22 | login or register to post comments

Efficiency and testability

After coding the EROS kernel in C++, we switched to C for its successor. There were three reasons for the switch:

The EROS kernel threw *no* exceptions, but we paid a 30% performance penalty due to reduced iCache utilization because of exception handler code that could never be reached.
Static model checking on languages that encourage opaquely indirected procedure calls (as in virtual functions) are much harder - the required conservative assumptions rapidly compromise useful results.
The reduction in "helpful abstraction" made it much easier to understand what the code was actually going to do on the machine. In a security kernel, that's critical.

I've said we didn't use exceptions, but we also didn't use inheritance. Given this, essentially none of the C++ features had any value for us.

The main thing we lost was a weaker sequence point specification, which wasn't a big issue for this particular code.

By shap at Tue, 2025年09月30日 04:03 | login or register to post comments

Efficiency and testability

After coding the EROS kernel in C++, we switched to C for its successor. There were three reasons for the switch:

The EROS kernel threw *no* exceptions, but we paid a 30% performance penalty due to reduced iCache utilization because of exception handler code that could never be reached.
Static model checking on languages that encourage opaquely indirected procedure calls (as in virtual functions) are much harder - the required conservative assumptions rapidly compromise useful results.
The reduction in "helpful abstraction" made it much easier to understand what the code was actually going to do on the machine. In a security kernel, that's critical.

I've said we didn't use exceptions, but we also didn't use inheritance. Given this, essentially none of the C++ features had any value for us.

The main thing we lost was a weaker sequence point specification, which wasn't a big issue for this particular code.

It is an open but unresolved question whether Rust would be better or safer for this code base.

By shap at Tue, 2025年09月30日 04:04 | login or register to post comments

Right. That's an example.

Opaquely indirected function calls seem worse and worse the more I deal with code that uses them. ESPECIALLY overloaded operators that invite the programmer to think s/he knows what the code means when it means something completely different.

[Redacted: nightmare story about overloaded "+" intended for array mathematics being called in place of overloaded "+" intended for string concatenation, uncaught because both were stored as byte vectors of compatible underlying type as seen by the third-party GPU library that redefined "+" when nobody was looking, resulting in strings of binary garbage emerging from unchanged code that was working a few days prior. It would have been much worse if it had been the other way, because a byte vector interpreted as the (numeric) result of array mathematics isn't as obviously garbage.

Operator overloading, at least in forms seen so far, is BAD. ]

That's an example of exactly the kind of thing I was talking about. Features baked into the extended runtime (like opaque indirected procedure calls) make the semantics much harder to check and verify, and impose a penalty in stack memory management overhead, Not to mention compiler time and optimization efficiencies lost where things can't be proven.

Even if your code doesn't use them. I know people claim about C++ that it imposes no overhead greater than C code. But the C++ runtime environment has to support things the C runtime environment doesn't.

Even if the code you write is just C and runs at the same speed, that runtime environment you're linking it to is bigger, heavier, and a bit slower. Even if the code you write is just C, the C++ compiler can't optimize it as tightly because the C++ compiler has to assume the worst about what, in a C++ environment, could *POSSIBLY* be in some other file of code and change the meaning of the code it's looking at.

By Ray Dillinger at Wed, 2025年10月01日 14:31 | login or register to post comments

Don't throw away a useful tool

I remember reading the code for an early GUI toolkit for X (Motif?). All in C, but explicitly building and using C++ style virtual tables.

It took me a while to figure out what was going on, because I hadn't learned C++ yet.

I guess my point is, there is a time and a place for systematic use of indirect branching. GUI toolkits are a particular field where object orientation just makes sense, even if you have to roll it yourself.

By Bruce J. Bell at Wed, 2025年10月01日 19:58 | login or register to post comments

Yeah, been there...

Good point; I have rolled OO in C myself, both directly (in code that the program I was working on would call) and indirectly (in implementing compilers and interpreters for languages with OO features).

My general opinion of OO is that it's frequently a useful technique. No matter whether you're programing in an "OO language" or not. If you as an individual programmer want to use an OO style, you can do it in C, or Pascal, or C, or Scheme with only marginally greater difficulty than you can do OO in languages made for it.

The main benefit of OO languages is not for individual programmers, but for organizations and large teams. They serve as a way to keep programmers working on different functionality in the same project out of each others' hair to the greatest degree possible. If you can point twenty different engineers at twenty different problems and they can go off and work on them effectively with only a few meetings, a dozen or so emails a day, and occasional phone calls, your code organization is a success.

OO features being standardized in OO languages is a good way to facilitate that kind of organizational success. A standard set of OO features provides a reasonably general way to make the interaction of different parts of the code work according to an easily-understood, knowable paradigm. And if that's shared knowledge, the engineers no longer have to discuss it all that much. Also objects having properties like "Private" which the compiler can enforce, prevents a lot of wasted bandwidth about unexpected calls to it from out-of-context causing data to be corrupted, etc.

OTOH, if fifty engineers are each rolling their own OO in languages that don't have standard OO features, you get chaos where they have to communicate with each other much much more instead of much much less.

By Ray Dillinger at Tue, 2025年10月07日 04:24 | login or register to post comments

...people claim about C++

...people claim about C++ that it imposes no overhead greater than C code.

People claim lots of things they have never measured. Even things that nobody actually involved in the creation of C++ ever claimed. At most, they claimed comparable performance for code that did comparable things.

The thing that broke that claim was exception handling, which has non-local overheads. At best, the claim for exception handling was that you wouldn't execute exception handling code for exceptions unless exceptions were thrown. Unfortunately you still pay in i-cache residency unless code placement in the binary is very clever, and that's actually quite hard to do.

Exceptions also make quite a mess in the type system.

The late arrival of templating in C++ and Go was unfortunate. The absence meant that neither language got the option to do return-or-error cleanly. The pressure to keep C++ compatible with C was probably excessive, and it left Bjarne so buried that there wasn't time for incorporating templates or algebraic types early. Unfortunately, they are hard to introduce after the fact.

By shap at Tue, 2025年10月28日 20:55 | login or register to post comments

Lambda the Ultimate

User login

Navigation