I am attempting to solve Question 10 of Chapter 6 of the C programming: A modern approach book.
The question asks to write a program that determines the earliest date from a sequence of user-inputted dates. The user will enter 0/0/0 to indicate that no more dates will be entered. An example of the program output is also given by the book where the user input is highlighted in italic.
Enter a date (mm/dd/yy): 3/6/08
Enter a date (mm/dd/yy): 5/17/07
Enter a date (mm/dd/yy): 6/3/07
Enter a date (mm/dd/yy): 0/0/0
5/17/07 is the earliest date.
Below is my initial attempt which seems to produce the desired output:
#include <stdio.h>
// Below is a program that accepts any number of user-entered dates and returns the earliest date
// The program keeps track of a running variable for earliest month, day and year, against which each entry is compared
int main(void)
{
int month, day, year;
int earliest_month, earliest_day, earliest_year;
// Initialisation
earliest_month = 0;
earliest_day = 0;
earliest_year = 0;
do
{
printf("Enter a date (mm/dd/yy): ");
scanf("%d/%d/%d", &month, &day, &year);
if (month == 0 && day == 0 && year == 0)
{
break;
}
if (earliest_month == 0 && earliest_day == 0 && earliest_year == 0)
{
earliest_month = month;
earliest_day = day;
earliest_year = year;
}
// First compare the years
if (year < earliest_year)
{
earliest_month = month;
earliest_day = day;
earliest_year = year;
}
else if (year == earliest_year)
{
// If years are equal, compare months
if (month < earliest_month)
{
earliest_month = month;
earliest_day = day;
}
else if (month == earliest_month)
{
// If years and months are the equal, compare days
if (day < earliest_day)
{
earliest_day = day;
}
}
}
} while (month != 0 && day != 0);
printf("%d/%d/%.2d is the earliest date", earliest_month, earliest_day, earliest_year);
return 0;
}
I now wish to get some feedback on this code and if possible, some recommendation on how to improve it. I already notice a double initialisation in my code precisely for the earliest month, day and year day variables (one outside the do and a second one inside the do statement). My idea was to have the earliest variables initially set to the first user output, but I still had to initialise them to 0 first (not sure whether that is required either but I also don't know how to make the program assign the first user input to these values).
Any feedback at all would be highly appreciated. Thanks in advance!
7 Answers 7
The first thing I would do would be to break the code up into some meaningful functions instead of having one long stream of code in main
.
Personally, I'd imagine main
looking something like this:
int main() {
Date current_date = init_date();
Date quit = init_date();
show_prompt();
Date earliest_date = read_date();
while (quit != (current_date = read_date()) {
if (less_than(current_date, earliest_date))
earliest_date = current_date;
}
print_date(earliest_date);
}
Then I'd implement read_date
, print_date
, and less_than
to allow main
to work about like that. This lets me quickly scan through main
and have a decent idea of what the whole program is supposed to do. It also lets me isolate problems fairly quickly. For example, if it's reading a date incorrectly, the problem is almost certainly in read_date
.
So to do that, we need a Date
type to work with:
struct Date {
// store mm/dd/yy in some form or other
};
Then we need to implement the functions we called. Those might depend on even smaller, simpler functions. Ideally, we'd like each to be so small and simple that it doesn't take much work to decide that assuming each function it calls does its thing correctly, this function will also do what it's supposed to.
else if (month == earliest_month)
Repeatedly testing each component is tedious and error prone.
Dealing with a 3-tuple is less convenient than a single scalar int.
Rather than offering a scan template, prefer to parse each date with
strptime.
With a tm
struct in hand, you can use
strftime
to produce easily compared
ISO-8601
timestamps that are human readable,
or you can use
mktime
to convert to time_t
integers,
which are also easily compared.
Or use (big endian) htonl to put y
, m
, d
into arays or zeroed structs, and use
memcmp
to compare the three fields all at once.
-
1\$\begingroup\$ I have not covered date data types in C yet but I will make sure to revisit your answer once I do - definitely a helpful tip when dealing with date types moving forward. Thank you! \$\endgroup\$Laika– Laika2023年11月22日 09:20:05 +00:00Commented Nov 22, 2023 at 9:20
-
2\$\begingroup\$
memcmp
on a struct ofint
members would depend on integer endianness, since it's giving you lexical order of byte-arrays. e.g.0x0001
is a smaller number than0x0100
, but on little-endian machines (x86, modern ARM), the first byte of1
will be0x01
but the first byte of0x100
will be 0. Also, C implementations are allowed to have padding between struct members which might hold different garbage for each. In practice real implementations won't pad when not needed for alignment (all members the same type). \$\endgroup\$Peter Cordes– Peter Cordes2023年11月22日 19:04:29 +00:00Commented Nov 22, 2023 at 19:04 -
\$\begingroup\$ @PeterCordes, granted, thank you. Amended to use network order. \$\endgroup\$J_H– J_H2023年11月22日 19:23:58 +00:00Commented Nov 22, 2023 at 19:23
-
\$\begingroup\$ Or instead of using memory, you could do
((long)y << 16) + (m<<8) + d
to construct an integer that monotonically increases with increasing day. (We could use smaller shifts if we wanted; month = 1..12 will fit in 4 bits, and d = 1..31 will fit in 5 bits. But sincelong
is at least 32 bits, that's 16 bits of space for a year. Hrm, does that work with negative years (BCE vs. CE)? I think it still does; march 1st in year -10 is later than jan 1st in year -10, so still adding the m, d components gives the right ordering. Also across years. Butmemcmp
compares asunsigned char
\$\endgroup\$Peter Cordes– Peter Cordes2023年11月22日 19:40:15 +00:00Commented Nov 22, 2023 at 19:40 -
\$\begingroup\$ Just noticed you linked the man page for
bzero
. As it says, that function is deprecated, and removed from the POSIX.2008 spec; usememset
in new code if you want to go that route. \$\endgroup\$Peter Cordes– Peter Cordes2023年11月22日 23:23:16 +00:00Commented Nov 22, 2023 at 23:23
Don't ignore runtime errors:
What happens if the user enters something that's not a valid date format (e.g. cnaisbciasvbiqe)? scanf()
will return 0, but we treat that exactly the same as if it returns 3.
#if 0
scanf("%d /%d/ %d", &month, &day, &year);
#else
if (scanf("%d/%d/%d", &d) != 3) {
fputs("That's not a valid date format!\n", stderr);
return EXIT_FAILURE;
}
#endif
Or consider reading a whole line with fgets()
, and then parsing it with sscanf()
.
Use standard status codes:
Prefer using EXIT_SUCCESS
and EXIT_FAILURE
from stdlib.h
to 0, 1, -1, et cetera.
Bug:
You do not deal with invalid dates. What happens when the day is negative, or the month is greater than 12, and so on?
Use unsigned
for small quantities that can not be negative:
// int month, day, year;
// int earliest_month, earliest_day, earliest_year;
unsigned int month, day, year;
unsigned int earliest_month, earliest_day, earliest_year;
Use a struct
for the date type:
It is tedious to deal with individual variables.
struct date {
unsigned int day;
unsigned int month;
unsigned int year;
};
// Or in some other form.
Initialize objects as you declare them:
#if 0
int month, day, year;
int earliest_month, earliest_day, earliest_year;
// Initialisation
earliest_month = 0;
earliest_day = 0;
earliest_year = 0;
#else
int month, day, year;
int earliest_month = 0, earliest_day = 0, earliest_year = 0;
// Or
int earliest_month = 0;
int earliest_day = 0;
int earliest_year = 0;
#endif
Don't comment the obvious.
// Initialisation
earliest_month = 0;
earliest_day = 0;
earliest_year = 0;
This also obscures things, because these are assignments. See: (https://stackoverflow.com/q/35662831/20017547)
// First compare the years
if (year < earliest_year)
...
// If years are equal, compare months
if (month < earliest_month)
...
// If years and months are the equal, compare days
if (day < earliest_day)
...
These should be removed too; they do not add any value to the code. The identifiers have been aptly chosen and it is very clear that we are comparing years, months, and days.
Simplify:
#if 0
else if (month == earliest_month)
{
// If years and months are the equal, compare days
if (day < earliest_day)
{
earliest_day = day;
}
}
#else
else if (month == earliest_month && day < earliest_day) {
earliest_day = day;
}
#endif
Ensure that the prompt is seen:
The stdout
stream is line-buffered by default (if it can be detected to refer to an interactive device), so it will only display the data in the buffer after it reaches a newline, or when it is flushed.
You could:
Call
fflush()
after the call toprintf()
:printf("Enter a date (mm/dd/yy): "); fflush(stdout);
Disable buffering with
setbuf()
/setvbuf()
:setbuf(stdout, NULL); // Or setvbuf(stdout, NULL, _IONBF, 0);
Reduce scope:
month
, day
, and year
need not be visible outside of the do-while
loop.
-
1\$\begingroup\$ Thank you very much for taking the time to write such a detailed explanation. Your answer brings several tips on how to improve the code and write it more efficiently, I have learned a lot from it. \$\endgroup\$Laika– Laika2023年11月22日 09:18:26 +00:00Commented Nov 22, 2023 at 9:18
-
1\$\begingroup\$ Minor,
"%d /%d /%d"
a tad more forgiving than""%d/%d/%d"
. \$\endgroup\$chux– chux2023年11月23日 05:36:35 +00:00Commented Nov 23, 2023 at 5:36 -
\$\begingroup\$ " it will only display the data in the buffer after it reaches a newline" --> depends on buffering mode and implementation details. ". only is not quite right. "or when it is flushed" --> Yes. \$\endgroup\$chux– chux2023年11月23日 05:43:30 +00:00Commented Nov 23, 2023 at 5:43
-
5\$\begingroup\$ Nitpick? I disagree with your suggestion/recommendation to use
size_t
for month, day, year, etc.size_t
should basically be used for sizes of things in memory and for indexing.int
is fine for month, day, year, etc. (and helps avoid sign issues if there are computations). If you suggest using "biggest possible", then that'sintmax_t
anduintmax_t
. \$\endgroup\$Pablo H– Pablo H2023年11月23日 13:53:50 +00:00Commented Nov 23, 2023 at 13:53 -
1\$\begingroup\$ I think the "Ensure that the prompt is seen:" issue is flawed as
printf("Enter a date (mm/dd/yy): "); scanf("%d/%d/%d", &month, &day, &year);
should flushstdout
first beforescanf()
on compliant implementations. Given the many implementation and spec details, I do yet have a good alternative. \$\endgroup\$chux– chux2023年11月24日 09:53:19 +00:00Commented Nov 24, 2023 at 9:53
Since you asked for suggestions / optimizations:
You can map y/m/d tuples to 32-bit integers that monotonically increase with increasing dates. This reduces the problem to finding the min
of a sequence of int32_t
/ int_least32_t
/ long
, or whatever type you prefer1.
I'm not sure I understood why we want to leave the loop on month == 0
or day == 0
but not other invalid input conditions, especially not after already updating earliest_...
from those values. Your do{}while
has an earlier if()break
for the documented exit condition of all-zeros, so I just kept that and made it a while(1)
.
I like do{}while()
loops, but using one in this case would require rotating the loop and partially peeling the first iteration so the scanf and exit condition could be written at the bottom, with the update using values from the previous iteration if the do{}while
condition was true. So perhaps scanf("...", &earliest_year, &earliest_month, &earliest_day)
and error-check ahead of the first iteration and initialize vars so those don't get overwritten. Do that manually if writing in asm; in C let the compiler do it unless there's a condition that naturally fits at the bottom of the loop. Writing a while(1){ ... }
loop hints human readers to look for a break;
somewhere, and it's usually easiest to reason about loops that have one exit point. (So all else equal prefer that, but don't go out of your way to avoid a loop condition and a break
.)
Prefer initializing variables when you declare them, and don't declare them until you're ready to use them. Note that long date = ...
is only declared inside the loop; it doesn't exist outside. (Unfortunately for do{}while()
syntax, loop variables aren't in scope in the while(expression)
so you need to declare them outside the loop if you want to use them there.)
long earliest_date = LONG_MAX; // no dates seen yet. Has invalid m=d=0
while (1) {
scanf ...
check for I/O errors and optionally for out-of-range months and days...
long date = ((long)y << 16) + (m<<8) + d;
if (date == 0) // also makes checking for all-zero easy
break;
earliest_date = date < earliest_date ? date : earliest_date;
}
if (earliest_date == LONG_MAX) {
puts("no valid dates were entered");
return EXIT_SUCCESS;
}
int earliest_year = earliest_date >> 16; // hopefully an arithmetic right shift so we can handle negative years, but ISO C leaves that implementation defined.
int earliest_month = (earliest_date >> 8) & 0xFF; // second byte
int earliest_day = earliest_date & 0xFF; // low byte
I used the term "byte" in a comment only to be concise; I actually meant "octet" but I figured "byte" would be more helpful to most readers. ISO C doesn't guarantee that a byte is 8 bits, but this code just uses shifts and integer constants without making any assumptions about CHAR_BIT
or sizeof(long)
, e.g. we used & 0xFF
instead of (unsigned char)earliest_date
to extract the low 8 bits, to match the shift by 8 we did when packing.
Note that this is just integer math, no byte addressing of memory. So endianness isn't a factor, unlike J_H's answer suggesting memcmp
.
It might be a good idea to make pack
and unpack
helper functions so the code using it can look like Date date = date_pack(y, m, d);
and date_unpack(earliest_date, &earliest_year, &earliest_month, &earliest_day)
. That separates the pack/unpack logic from the loop using it to find the earliest, so you tweak the packing format all in one place, not mixed with other logic. Making those functions static inline
will let them most easily optimize away.
#include <limits.h> // for LONG_MAX
// an integer type that allows < and == compares, and is all 0 for d=m=y=0
typedef long Date;
#define DATE_MIN LONG_MIN
#define DATE_MAX LONG_MAX
// and use Date earliest_date = DATE_MAX;
static inline
Date date_pack(int y, int m, int d){
return ((Date)y<<16) + (m<<8) + d;
}
static inline int date_year(Date date){
// hopefully an arithmetic right shift so we can handle negative years, but ISO C leaves that implementation defined.
return date >> 16;
}
static inline int date_month(Date date){
return (date >> 8) & 0xFF; // second byte
}
static inline int date_day(Date date){
return date & 0xFF; // low byte
}
static inline
void date_unpack(Date date, int *y, int *m, int *d){
*y = date_year(date);
*m = date_month(date);
*d = date_day(date);
}
These compile efficiently for x86-64 and AArch64 as expected (Godbolt); when inlining into a caller that passes the address of local vars, the actual stores would be optimized away. It's also significantly more source lines than just doing it directly inside the loop, so there's a tradeoff in terms of over-engineering if you're not actually going to reuse this pack/unpack code in other functions (yet).
We could use smaller shifts if we wanted; month = 1..12
will fit in 4 bits, and d = 1..31
will fit in 5 bits. But even with 8 bits for each, that's still 16 bits of space for a year in a 32-bit long
. (And much more in a 64-bit long
.)
If you want to support negative dates (e.g. 10 BCE as -10
, the current year 2023 CE as +2023
), this pack/unpack method would depend on long >>
being an arithmetic right shift (shifting in copies of the sign bit for 2's complement systems), rather than logical (shifting in zeros). ISO C unfortunately leaves that choice up to the implementation, and a few embedded compilers for CPUs without arithmetic right shift choose logical. (Fun fact: it's possible to portably emulate arithmetic right shift using bitwise operations, and mainstream compilers are even able to optimize that back into a single arithmetic right-shift instruction.)
This mapping is still monotonic for negative years, and across the year-zero boundary. For example March 1st in year -10 is later than Jan 1st in year -10, so still adding (rather than subtracting) the m
and d
components gives the right ordering.
If users enter nonsense numbers for y/m/d, for example day=1000000, that's big enough to "carry" into the month and year part of the integer. Your version allows full 32-bit integers for all 3 components while still maintaining ordering. But they're supposed to be dates so this is basically just a behaviour quirk on bad input, or something that would never come up if range-checking the input.
Footnote 1: what integer type to use for 32-bit integers
int
is often 32 bits, but ISO C only guarantees that it's at least 16-bit. Some real-world C implementations (for microcontrollers these days) have 16-bit int
, so that's not the most portable choice.
For a single local variable, long
is fine; ISO C requires it to be at least 32 bits wide. 64-bit systems often use 64-bit long
, but on such systems, compare and shift aren't slower than with 32-bit types. (Except a bit of extra machine-code size on x86-64 for REX prefixes). If you were making more use of this, e.g. in structs and/or arrays, you'd want to use something like int_least32_t
, or int32_t
if you don't mind requiring that optional type to exist. (Pretty much everything these days uses 2's complement integers with power-of-2 bit widths so do support int32_t
.) Unfortunately int_fast32_t
is not usable for use-cases like this because of bad choices by some major platforms.
If you were storing arrays of these, you'd want to put more effort into making sure you didn't get an unnecessarily-wide type that would over-align your structs and cost memory bandwidth / cache footprint.
You don't actually need 32 bits; 24 would be sufficient, especially if you pack m
and d
more tightly in 4 and 5 bits. Some DSPs have 24-bit int
. Especially using unsigned
would get you an extra bit for non-negative years which could be useful.
Perhaps you could check #ifdef INT_MAX > (1LL << 23)
and use int
in that case. In C++, you'd check std::numeric_limits<int>::digits
. The int_leastN_t
types only come in power-of-2 sizes so don't fit the bill here. In C23, there's _BitInt(24)
, but that would be less efficient than a 32-bit type on a normal 32 or 64-bit CPU.
16-bit Date
is only enough for 7 bits of year, which isn't enough for 4-decimal-digit dates. 2-digit dates have been obsolete since before the turn of the century, and writing new code that can't handle 4-digit dates would be a serious mistake. It's surprising that an assignment would ask you to write code using them. The current year, 2023, needs 11 bits (unsigned), and 2048 isn't too far away. That uses 12 bits (unsigned).
It's also bad that it asks for date input in a backwards order, not y/m/d (https://xkcd.com/1179/). Perhaps that's to stop you from requiring a strict text format with a leading zero for one-digit numbers, because that would produce a string you could compare with strcmp
or memcmp
; lexical sorting of date strings in ISO 8601 format (2023年11月23日
) orders them correctly. That's why that date order is superior.
-
\$\begingroup\$ This is certainly a speedy approach. \$\endgroup\$chux– chux2023年11月23日 05:37:58 +00:00Commented Nov 23, 2023 at 5:37
-
\$\begingroup\$ Corner: "BCE as -10, the current year 2023 CE as +202" --> Hmm, more like "BCE as -9 .." as the year after 1 BC is 1 AD. \$\endgroup\$chux– chux2023年11月23日 05:48:32 +00:00Commented Nov 23, 2023 at 5:48
-
1\$\begingroup\$ @chux-ReinstateMonica: Thanks for those. IDK how I missed the INT_MAX vs. LONG_MAX; I'd decided on
long
before I wrote that line. Indeed, with 16-bitint
,INT_MAX
would have a year of0
, serious bug! \$\endgroup\$Peter Cordes– Peter Cordes2023年11月23日 05:49:52 +00:00Commented Nov 23, 2023 at 5:49 -
\$\begingroup\$ @chux-ReinstateMonica: This mapping to integers isn't intended to preserve relative distances between dates in general, and doesn't for days or months across years. But yes, in terms of what users would type in, they might well prefer to choose
-9
instead of-10
when the date they're representing is in 10 BCE, if they want to use integer years-since-1BCE as the meaning of the year field. I was thinking of it as simply using the-
sign as a BCE indicator, instead of parsing suffixes like10 BC[E]
vs.10 (AD|CE)?
\$\endgroup\$Peter Cordes– Peter Cordes2023年11月23日 05:56:29 +00:00Commented Nov 23, 2023 at 5:56 -
1\$\begingroup\$ I was thinking of
((Date)y<<9) + (m<< 5) + d;
or((Date)y * 12 + m) * 32 + d;
and using a unsigned type forDate
, possible evenunsigned
to work over a wide range on 16-bit machines with wide math, yet your answer here presents the idea well keeping it simply. \$\endgroup\$chux– chux2023年11月23日 05:57:04 +00:00Commented Nov 23, 2023 at 5:57
I have only a few suggestions. Many other good ideas are in the other answers.
Rather than initializing the earliest_* family to 0, initialize them to an impossibly large value (like
INT_MAX
). This allows you to eliminate the zero test that is the secondif
in thedo while
loop. You might then want to consider just what should be output if the first input is0/0/0
.The current code has two distinct exit conditions. The first is for
0/0/0
. The second is for any entry with both the day and month, but not the year as 0 (like0/0/50
). This second is in the while condition, and allows the entry to be the earliest. You don't actually need that second test, unless you really meant for that possibility. Thus, change it to:do { ... } while (1);
or better:for (;;) { ... }
You might consider a single combined
if
, thus:
if ((year < earliest_year) ||
((year == earliest_year) &&
(month < earliest_month) ||
((month == earliest_month) && (day < earliest_day))))
{
earliest_month = month;
earliest_day = day;
earliest_year = year;
}
Many of the answers consider packing the year, month and day into a single integer. Usually when I do this, I use decimal values (year * 100 + month) * 100 + day
rather than bit shifts so that the date 24 November 2023 would be represented by the decimal integer 20231124
rather than 0x07e70b18
let alone 132582168
. It's unlikely that the difference between multiplying by 100 or shifting is a performance issue, and it makes debugging much easier if you can just read the date from the value.
You asked about optimising. I introduce to you: unions
struct date {
unsigned short year;
unsigned char month;
unsigned char day;
};
union date_union {
date parsed;
unsigned int raw;
}
date_union mindate;
date_union mydate;
mindate.raw = (unsigned int)-1;
mydate.raw = 0;
...
scanf("%d/%d/%d", &mydate.parsed.month, &mydate.parsed.day, &mydate.parsed.year);
if (mydate.raw < mindate.raw) {
mindate.raw = mydate.raw;
}
...
What this will do is store any date as (roughly) yyyymmdd in memory. Individual components can be accessed through .parsed, while the whole can be accessed through .raw. As we store year up front and month and day after that, a < comparison will always look at year first. Then month, then day. It reduces the amount of comparisons you'll need to write out by a lot.
It's basically like performing the bitshifting operations a lot of the other answers suggested, but then using language features and less math/thinking. I'm not entirely sure, but I think this should also be faster than bitshifting, as it's not involving any actual operations but memory access.
Disclaimers:
- U.B. in C++, this phenomenon is called type-punning (see https://stackoverflow.com/questions/11373203/accessing-inactive-union-member-and-undefined-behavior)
- A lot of people tend to find them scary, but they're common in e.g. embedded.
- I usually write C++, so do obviously check this code just like you'd check any other code you take from the internet.
-
3\$\begingroup\$ "What this will do is store any date as (roughly) yyyymmdd in memory." --> No. it depends on the endian of
unsigned
. May needstruct date { unsigned char day; unsigned char month; unsigned short year;};
. \$\endgroup\$chux– chux2023年11月24日 09:30:29 +00:00Commented Nov 24, 2023 at 9:30 -
\$\begingroup\$ "... but they're common in e.g. embedded." --> so are 16-bit
int
common in embedded. Better to not codeunsigned int raw;
withunsigned
and use a type that certainly matches the size ofstruct date
thanunsigned
.unsigned
might be 2, 4, ... \$\endgroup\$chux– chux2023年11月24日 09:38:05 +00:00Commented Nov 24, 2023 at 9:38
13/33/13
? And what about negative years - yes, they also exist! I assume you are still young, thus you never heard about the Year-2k-Bug \$\endgroup\$java.util.time
or Joda time, I don't know what the best practice is for C). Cheers :) \$\endgroup\$earliest_year
with a high value, likeINT_MAX
, you can completely get rid of theearliest_month == 0 && earliest_day == 0 && earliest_year == 0
check/block. \$\endgroup\$