Rumor is that the next version of C will disallow sign magnitude and ones' complement signed integer encoding. True or not, it seems efficient to not have to code and test for those rare encodings.
[Edit 2025] C23 now only allows 2's complement encoding for signed integers.
Yet if code might not handle such cases as non-2's complement, it is prudent to detect and fail such compilations today.
Rather than just look for that one kind of dinosaur1, below is C code that looks for various unicorns2 and dinosaurs. Certainly some tests are more useful than others.
Review goal:
Please report any dinosaur1 and unicorns2 compilers found by this code.
Review how well this code would successfully flag true passé compilers and not report new innovative ones (e.g. 128-bit
intmax_t
.)Suggest any additional or refined tests.
Pre-C11 compilers that lack
static_assert
may readily need a better#define static_assert ...
than this code. Better alternatives are appreciated, but not a main goal of this post.
Note: I am not trying to rate strict adherence to IEEE_754 and the like.
Future readers concerning spelling and grammar in this post: Although they should get corrected in an answer, edits to the question's code are not site appropriate.
/*
* unicorn.h
* Various tests to detect old and strange compilers.
*
* Created on: Mar 8, 2019
* Author: chux
*/
#ifndef UNICORN_H_
#define UNICORN_H_
#include <assert.h>
#ifndef static_assert
#define static_assert( e, m ) typedef char _brevit_static_assert[!!(e)]
#endif
#include <float.h>
#include <limits.h>
#include <stdint.h>
/*
* Insure 2's complement
* Could also check various int_leastN_t, int_fastN_t
*/
static_assert(SCHAR_MIN < -SCHAR_MAX && SHRT_MIN < -SHRT_MAX &&
INT_MIN < -INT_MAX && LONG_MIN < -LONG_MAX &&
LLONG_MIN < -LLONG_MAX && INTMAX_MIN < -INTMAX_MAX &&
INTPTR_MIN < -INTPTR_MAX && PTRDIFF_MIN < -PTRDIFF_MAX
, "Dinosuar: Non-2's complement.");
/*
* Insure the range of unsigned is 2x that of positive signed
* Only ever seen one once with the widest unsigned and signed type with same max
*/
static_assert(SCHAR_MAX == UCHAR_MAX/2 && SHRT_MAX == USHRT_MAX/2 &&
INT_MAX == UINT_MAX/2 && LONG_MAX == ULONG_MAX/2 &&
LLONG_MAX == ULLONG_MAX/2 && INTMAX_MAX == UINTMAX_MAX/2,
"Dinosuar: narrowed unsigned.");
/*
* Insure char is sub-range of int
* When char values exceed int, makes for tough code using fgetc()
*/
static_assert(CHAR_MAX <= INT_MAX, "Dinosuar: wide char");
/*
* Insure char is a power-2-octet
* I suspect many folks would prefer just CHAR_BIT == 8
*/
static_assert((CHAR_BIT & (CHAR_BIT - 1)) == 0, "Dinosaur: Uncommon byte width.");
/*
* Only binary FP
*/
static_assert(FLT_RADIX == 2, "Dinosuar: Non binary FP");
/*
* Some light checking for pass-able FP types
* Certainly this is not a full IEEE check
* Tolerate float as double
*/
static_assert(sizeof(float)*CHAR_BIT == 32 || sizeof(float)*CHAR_BIT == 64,
"Dinosuar: Unusual float");
static_assert(sizeof(double)*CHAR_BIT == 64, "Dinosuar: Unusual double");
/*
* Heavier IEEE checking
*/
static_assert(DBL_MAX_10_EXP == 308 && DBL_MAX_EXP == 1024 &&
DBL_MIN_10_EXP == -307 && DBL_MIN_EXP == -1021 &&
DBL_DIG == 15 && DBL_DECIMAL_DIG == 17 && DBL_MANT_DIG == 53,
"Dinosuar: Unusual double");
/*
* Insure uxxx_t range <= int
* Strange when unsigned helper types promote to int
*/
static_assert(INT_MAX < UINTPTR_MAX, "Unicorn: narrow uintptr_t");
static_assert(INT_MAX < SIZE_MAX, "Unicorn: narrow size_tt");
/*
* Insure xxx_t range >= int
* Also expect signed helper types at least int range
*/
static_assert(INT_MAX <= PTRDIFF_MAX, "Unicorn: narrow ptrdiff_t");
static_assert(INT_MAX <= INTPTR_MAX, "Unicorn: narrow intptr_");
/*
* Insure all integers are within `float` finite range
*/
// Works OK when uintmax_t lacks padding
static_assert(FLT_RADIX == 2 && sizeof(uintmax_t)*CHAR_BIT < FLT_MAX_EXP,
"Unicorn: wide integer range");
// Better method
#define UNICODE_BW1(x) ((x) > 0x1u ? 2 : 1)
#define UNICODE_BW2(x) ((x) > 0x3u ? UNICODE_BW1((x)/0x4)+2 : UNICODE_BW1(x))
#define UNICODE_BW3(x) ((x) > 0xFu ? UNICODE_BW2((x)/0x10)+4 : UNICODE_BW2(x))
#define UNICODE_BW4(x) ((x) > 0xFFu ? UNICODE_BW3((x)/0x100)+8 : UNICODE_BW3(x))
#define UNICODE_BW5(x) ((x) > 0xFFFFu ? UNICODE_BW4((x)/0x10000)+16 : UNICODE_BW4(x))
#define UNICODE_BW6(x) ((x) > 0xFFFFFFFFu ? \
UNICODE_BW5((x)/0x100000000)+32 : UNICODE_BW5(x))
#define UNICODE_BW(x) ((x) > 0xFFFFFFFFFFFFFFFFu ? \
UNICODE_BW6((x)/0x100000000/0x100000000)+64 : UNICODE_BW6(x))
static_assert(FLT_RADIX == 2 && UNICODE_BW(UINTMAX_MAX) < FLT_MAX_EXP,
"Unicorn: wide integer range");
/*
* Insure size_t range > int
* Strange code when a `size_t` object promotes to an `int`.
*/
static_assert(INT_MAX < SIZE_MAX, "Unicorn: narrow size_t");
/*
* Recommended practice 7.19 4
*/
static_assert(PTRDIFF_MAX <= LONG_MAX, "Unicorn: ptrdiff_t wider than long");
static_assert(SIZE_MAX <= ULONG_MAX, "Unicorn: size_t wider thna unsigned long");
/*
* Insure range of integers within float
*/
static_assert(FLT_RADIX == 2 && sizeof(uintmax_t)*CHAR_BIT < FLT_MAX_EXP,
"Unicorn: wide integer range");
// Addition code could #undef the various UNICODE_BWn
#endif /* UNICORN_H_ */
Test driver
#include "unicorn.h"
#include <stdio.h>
int main(void) {
printf("Hello World!\n");
return 0;
}
1 C is very flexible, yet some features applied to compilers simply no longer in use for over 10 years. For compilers that used out-of-favor features (non-2's complement, non-power-of-2 bit width "bytes", non-binary floating-point, etc.) I'll call dinosaurs.
2 C is very flexible for new platform/compilers too. Some of these potential and theoretical compliers could employ very unusual features. I'll call these compilers unicorns. Should one appear, I rather have code fail to compile than compile with errant functioning code.
4 Answers 4
I think that
static_assert((CHAR_BIT & (CHAR_BIT - 1)) == 0
can be pretty safely replaced byCHAR_BIT==8
. There are various old DSP compilers that would fail the test, but they are indeed dinosaur systems.stdint.h and constants like
SIZE_MAX
,PTRDIFF_MAX
were added in C99. So by using such macros/constants, you'll essentially cause all C90 compilers to fail compilation.Are C90 compilers dinosaurs per your definition? If not, then maybe do some checks if
__STDC_VERSION__
is defined and if so what version. Because most of the exotic ones are likely to follow C90.
-
3\$\begingroup\$ I was under the impression that some modern DSPs were word-addressable and had
CHAR_BIT
= 16, 24, or 32. But I don't do embedded development so I might have read something old without realizing it. \$\endgroup\$Peter Cordes– Peter Cordes2021年07月15日 18:53:04 +00:00Commented Jul 15, 2021 at 18:53
I'm appalled! What kind of code are you writing that's so inflexible it needs all these tests? ;-p
Seriously, it ought to be possible to enable only the tests that the including code needs, perhaps by predefining macros that declare its non-portabilities:
#ifdef REQUIRE_BINARY_FP
static_assert(FLT_RADIX == 2, "Dinosaur: Non binary FP");
#endif
(to pick a simple example)
On a minor note, here's a typo:
static_assert(SIZE_MAX <= ULONG_MAX, "Unicorn: size_t wider thna unsigned long");
s/thna/than/
On an even minorer note, in the comments you've consistently written "insure" where you evidently mean "ensure".
Additional tests to consider:
- We might begin by testing
__STDC_VERSION__
- certainly I'd consider anything less than 201112L to be a dinosaur. If you want to be really clever, it's possible to write a compile-time test against the characters in__DATE__
, to whinge about standards more than (say) ten years old. - I've seen code that breaks if
'z' - 'a' != 25
and/or'Z' - 'A' != 25
. - We might care that
wchar_t
can represent ISO 10646/Unicode code-points up to U+10FFFF (N.B. I've heard that some platforms still in use fail this test). Standard C only requiresWCHAR_WIDTH >= 8
. Consider also testing__STDC_ISO_10646__
- Some code requires the existence of optional exact-width integer types such as
uint32_t
andintptr_t
. - Perhaps some code requires
long double
to be bigger (in precision and/or range) thandouble
? - We could test for support for optional features such as
<stdatomic.h>
,<complex.h>
and<threads.h>
whose macros are listed in 6.10.10.4.
-
1\$\begingroup\$ A DeathStation9000 could choose not to provide
uint32_t
even ifCHAR_BIT=8
, even ifunsigned
would work asuint32_t
. That's unlikely for real-world C99 implementations, though. \$\endgroup\$Peter Cordes– Peter Cordes2021年07月15日 18:56:47 +00:00Commented Jul 15, 2021 at 18:56 -
\$\begingroup\$ @Peter, I think that would be non-compliant: If an implementation provides standard or extended integer types with a particular width and no padding bits, it shall define the corresponding typedef names. \$\endgroup\$Toby Speight– Toby Speight2025年04月11日 05:35:09 +00:00Commented Apr 11 at 5:35
-
\$\begingroup\$ @TobySpeight: I think you're misreading what I wrote. Your answer suggests that if
unsigned int
is 32 bits wide with no padding, the compiler will use it to provideuint32_t
. But it could simply not defineuint32_t
at all in that case and still be compliant, because the type is optional. So testing stuff aboutunsigned int
orunsigned long
doesn't prove the existence ofuint32_t
if we don't limit ourselves to sane compilers that are trying to be useful. Of course, if you test stuff aboutunsigned int
, you could just useunsigned int
or your own typedef. \$\endgroup\$Peter Cordes– Peter Cordes2025年04月11日 05:53:35 +00:00Commented Apr 11 at 5:53 -
\$\begingroup\$ No, I didn't misread you. But maybe I've misunderstood the standard? My reading is that in the case you describe, then
unsigned int
is a standard integer type that satisfies those conditions and therefore a definition ofuint32_t
is required. If you think I'm wrong, then we could ask for interpretation on Stack Overflow? \$\endgroup\$Toby Speight– Toby Speight2025年04月11日 05:59:56 +00:00Commented Apr 11 at 5:59 -
1\$\begingroup\$ @TobySpeight Concerning: "Some code requires the existence of optional exact-width integer types such as
uint32_t
andintptr_t
" --> With/without these detection tests, compilation fails. So far the detection tests were for things that a compilation would not fail "it is prudent to detect and fail such compilations", yet the code depended on certain things for correct or timely functionality - or so my thinking. This suggesteduintN_t
test takes the detection test a step further - hmmm. This applies to "test for support for optional features such as <stdatomic.h> ..." also. \$\endgroup\$chux– chux2025年04月12日 01:34:27 +00:00Commented Apr 12 at 1:34
In addition to fine answers @Toby Speight, @Lundin and a related FP question, came up with additional idea/detail.
Spelling*
"Dinosuar" --> "Dinosaur".
ASCII or not
Could use a lengthy test of the execution character set C11 §5.2.1 3
A to Z
a to z
0 to 9
! " # % & ’ ( ) * + , - . / : ; < = > ? [ \ ] ^ _ { | } ~
space character,
and control characters representing horizontal tab, vertical tab, and form feed.
some way of indicating the end of each line of text
Note that $
, @
, grave accent, ASCII 127 and various control characters are not mentioned above.
static_assert(
'A' == 65 && 'B' == 66 && 'C' == 67 && 'D' == 68 && 'E' == 69 && 'F' == 70
&& 'G' == 71 && 'H' == 72 && 'I' == 73 && 'J' == 74 && 'K' == 75
&& 'L' == 76 && 'M' == 77 && 'N' == 78 && 'O' == 79 && 'P' == 80
&& 'Q' == 81 && 'R' == 82 && 'S' == 83 && 'T' == 84 && 'U' == 85
&& 'V' == 86 && 'W' == 87 && 'X' == 88 && 'Y' == 89 && 'Z' == 90,
"Dinosaur: not ASCII A-Z");
static_assert(
'a' == 97 && 'b' == 98 && 'c' == 99 && 'd' == 100 && 'e' == 101
&& 'f' == 102 && 'g' == 103 && 'h' == 104 && 'i' == 105 && 'j' == 106
&& 'k' == 107 && 'l' == 108 && 'm' == 109 && 'n' == 110 && 'o' == 111
&& 'p' == 112 && 'q' == 113 && 'r' == 114 && 's' == 115 && 't' == 116
&& 'u' == 117 && 'v' == 118 && 'w' == 119 && 'x' == 120 && 'y' == 121
&& 'z' == 122, "Dinosaur: not ASCII a-z");
static_assert('0' == 48, "Dinosaur: not ASCII 0-9"); // 1-9 follow 0 by spec.
static_assert(
'!' == 33 && '"' == 34 && '#' == 35 && '%' == 37 && '&' == 38
&& '\'' == 39 && '(' == 40 && ')' == 41 && '*' == 42 && '+' == 43
&& ',' == 44 && '-' == 45 && '.' == 46 && '/' == 47 && ':' == 58
&& ';' == 59 && '<' == 60 && '=' == 61 && '>' == 62 && '?' == 63
&& '[' == 91 && '\\' == 92 && ']' == 93 && '^' == 94 && '_' == 95
&& '{' == 123 && '|' == 124 && '}' == 125 && '~',
"Dinosaur: not ASCII punct");
static_assert(
' ' == 32 && '\t' == 9 && '\v' == 11 && '\f' == 12 && '\n' == 10,
"Dinosaur: not ASCII space, ctrl");
static_assert('\a' == 7 && '\b' == 8 && '\r' == 13,
"Dinosaur: not ASCII spaces");
// Not 100% confident safe to do the following test
static_assert('$' == 36 && '@' == 64 && '`' == 96,
"Dinosaur: not ASCII special");
[Edit 2019 Dec]
On review, incorporating @Deduplicator idea: CHAR_MAX <= INT_MAX
is not a strong enough test to avoid trouble with fgetc()
, but should use UCHAR_MAX <= INT_MAX
. This makes certain that the number of possible characters returned from fgetc()
is less than the positive int
range - preventing a collision with EOF
.
/*
* Insure char is sub-range of int
* When char values exceed int, makes for tough code using fgetc()
*/
// static_assert(CHAR_MAX <= INT_MAX, "Dinosaur: wide char");
static_assert(UCHAR_MAX <= INT_MAX, "Dinosaur: wide char");
-
\$\begingroup\$ I don't suppose there's any way to rewrite those ASCII range tests as
"ABCDEFGHI...XYZ" with a check for
str[i+65] == i`. Probably not in a way compatible with static_assert, without C++ constexpr functions to allow a loop. \$\endgroup\$Peter Cordes– Peter Cordes2021年07月15日 18:59:45 +00:00Commented Jul 15, 2021 at 18:59 -
\$\begingroup\$ @PeterCordes I do not see a way. Perhaps
static_assert('A' == 65 && 'B' == 'A' + 1 && 'C' == 'B' + 1 && 'D' == 'C' + 1 ...
for a more friendly/sane looking test? \$\endgroup\$chux– chux2021年07月15日 19:04:44 +00:00Commented Jul 15, 2021 at 19:04
Instead of:
static_assert(SCHAR_MIN < -SCHAR_MAX && SHRT_MIN < -SHRT_MAX &&
INT_MIN < -INT_MAX && LONG_MIN < -LONG_MAX &&
LLONG_MIN < -LLONG_MAX && INTMAX_MIN < -INTMAX_MAX &&
INTPTR_MIN < -INTPTR_MAX && PTRDIFF_MIN < -PTRDIFF_MAX
, "Dinosuar: Non-2's complement.");
I prefer:
static_assert( SCHAR_MIN < -SCHAR_MAX, "Dinosaur: Non-2's complement.");
static_assert( SHRT_MIN < -SHRT_MAX, "Dinosaur: Non-2's complement.");
static_assert( INT_MIN < -INT_MAX, "Dinosaur: Non-2's complement.");
static_assert( LONG_MIN < -LONG_MAX, "Dinosaur: Non-2's complement.");
static_assert( LLONG_MIN < -LLONG_MAX, "Dinosaur: Non-2's complement.");
static_assert( INTMAX_MIN < -INTMAX_MAX, "Dinosaur: Non-2's complement.");
static_assert( INTPTR_MIN < -INTPTR_MAX, "Dinosaur: Non-2's complement.");
static_assert(PTRDIFF_MIN < -PTRDIFF_MAX, "Dinosuar: Non-2's complement.");
Granted, this code won't survive any automated code formatting, but it's much easier to grasp than the all-in-one assertion. Also, when one of the assertions fails, you know exactly which of these types is unusual.
On another topic: UNICODE_BW1
is a typo, it should be UNICORN_BW1
.
-
1\$\begingroup\$ This approach does have an advantage if needing to compile pre-C99 as it is cleaner to
#if define(LLONG_MIN)
aroundstatic_assert( LLONG_MIN < -LLONG_MAX, ...
and any select integer types. \$\endgroup\$chux– chux2020年01月22日 15:21:21 +00:00Commented Jan 22, 2020 at 15:21
unsigned char
is a sub-range ofint
instead? To wit,EOF
being distinct is useful. \$\endgroup\$