Archives
- April 2025
- March 2025
- February 2025
- January 2025
- December 2024
- November 2024
- October 2024
- September 2024
- August 2024
- July 2024
- June 2024
- May 2024
- April 2024
- March 2024
- February 2024
- January 2024
- October 2023
- September 2023
- August 2023
- July 2023
- June 2023
- May 2023
- April 2023
- March 2023
- January 2023
- December 2022
- November 2022
- October 2022
- September 2022
- July 2022
- June 2022
- May 2022
- April 2022
- March 2022
- February 2022
- January 2022
- December 2021
- November 2021
- October 2021
- September 2021
- August 2021
- July 2021
- June 2021
- May 2021
- April 2021
- March 2021
- February 2021
- January 2021
- December 2020
- November 2020
- October 2020
- September 2020
- August 2020
- July 2020
- June 2020
- May 2020
- April 2020
- March 2020
- February 2020
- January 2020
- December 2019
- November 2019
- October 2019
- September 2019
- August 2019
- July 2019
- June 2019
- May 2019
- April 2019
- March 2019
- February 2019
- January 2019
- December 2018
- November 2018
- October 2018
- August 2018
- July 2018
- June 2018
- May 2018
- April 2018
- March 2018
- February 2018
- January 2018
- December 2017
- November 2017
- October 2017
- August 2017
- July 2017
- June 2017
- May 2017
- April 2017
- March 2017
- February 2017
- January 2017
- December 2016
- November 2016
- October 2016
- September 2016
- August 2016
- July 2016
- June 2016
- May 2016
- April 2016
- March 2016
- February 2016
- January 2016
- December 2015
- November 2015
- October 2015
- September 2015
- August 2015
- July 2015
- June 2015
- May 2015
- April 2015
- March 2015
- February 2015
- January 2015
- December 2014
- November 2014
- October 2014
- September 2014
- August 2014
- July 2014
- June 2014
- May 2014
- April 2014
- March 2014
- February 2014
- January 2014
- December 2013
- November 2013
- October 2013
- September 2013
- August 2013
- July 2013
- June 2013
- May 2013
- April 2013
- March 2013
- February 2013
- January 2013
- December 2012
- November 2012
- October 2012
- September 2012
- August 2012
- July 2012
- June 2012
- May 2012
- April 2012
- March 2012
- February 2012
- January 2012
- December 2011
- November 2011
- October 2011
- September 2011
- August 2011
- July 2011
- June 2011
- May 2011
- April 2011
- March 2011
- January 2011
- November 2010
- October 2010
- August 2010
- July 2010
BSD Buglets
Last week I ran into two wholly unrelated problems while researching the history of BSD-derived Unix systems on PCs. Both are classics in their category and merit a closer look.
Y2K Strikes Again
The first issue is a very typical Y2K bug found in 386BSD 0.0 and 0.1. When the system comes up (if it does—it’s not easy to bring up 386BSD 0.x on anything remotely modern!), it shows the system date as January 1, 1970, i.e. the beginning of the UNIX epoch. This is not merely a cosmetic issue.
For example when rebuilding the 386BSD kernel, or indeed any software which uses the make
utility, the source files will be timestamped 1992 or later, but the object files will be timestamped 1970. As a consequence, the object files will be always out of date and make
will be forced to rebuild them. It gets much worse if the system is networked. It is possible to correct the date manually but it will be reset to 1970 every time the system boots, which is rather unsatisfactory.
Luckily, fixing the problem is not difficult, especially if one has indexed source code at hand. The 386BSD kernel must read the initial date from the RTC CMOS non-volatile memory, and a search for “CMOS” brings us to /usr/src/sys.386bsd/i386/isa/rtc.h. The RTC_YEAR
macro corresponds to the year byte in the RTC and it is only used in a single function, inittodr()
in usr/src/sys.386bsd/i386/isa/clock.c. The function was obviously written with the assumption that the year is roughly in the 70-99 range and corresponds to 1970-1999.
To get a sensible result in the third millennium, simply assume that year values lower than, say, 80 must correspond to the 2000s rather than the 1900s. The following rough patch is intended for 386BSD 0.1:
--- clock.old+++ clock.c @@ -138,6 +138,7 @@ sec = bcd(rtcin(RTC_YEAR)); + if (sec < 80) sec += 100; leap = !(sec % 4); sec += ytos(sec); /* year */ yd = mtos(bcd(rtcin(RTC_MONTH)),leap); sec += yd; /* month */
With this fix in place, 386BSD boots up with a clock time that’s reasonably close to reality.
The 386BSD clock initialization code obviously has more bugs, such as the leap year detection, or incorrectly adding the ytos()
result to sec
rather than just assigning it, which results in the clock being off by a few minutes. This (three separate bugs in a tiny section of code) is par for the course in the early 386BSD releases which were really quite buggy where the PC-specific support was concerned.
Note that the two-digit to four-digit year conversion cutoff is by necessity somewhat arbitrary. 1970, 1980, or 1990 would all have made sense; the UNIX epoch starts in 1970, the PC wasn’t released before 1980, and 386BSD wasn’t released before 1990. Neither is a real solution, which would require reading the century from the RTC as well.
As with many other Y2K bugs, this one slipped through because it was undetectable in a normal usage scenario. It just couldn’t happen before January 1st, 2000—not unless someone deliberately set their PC’s date into the future.
Just how fast are interrupts?
NetBSD 1.0 shares much code with 386BSD but there’s a difference of about two years between 386BSD 0.1 (1992) and NetBSD 1.0 (1994). NetBSD 1.0 has no trouble guessing the current year correctly (it uses 1970 as the cutoff). But it has another somewhat common problem which is related to hardware interrupt processing.
NetBSD 1.0 for the i386 architecture came with two boot floppies, one with Adaptec AHA-154x SCSI HBA support and the other with support for BusLogic BT-742 and compatibles. One reason for separate floppies was probably the fact that the BusLogic adapters were compatible with the AHA-154x and both drivers might load on a system equipped with a BusLogic HBA, with predictably unpleasant consequences.
If one boots from the BusLogic floppy and simply mounts a disk attached to /dev/sd0a
or similar, after about 10 seconds there might be a message on the console along the lines of sd0(bt0:0:0): timed out
. After further two seconds, the system might panic and die. Ironically, before the timeout message, the system had no trouble accessing the disk and all I/O requests had been completed.
The timeout is in fact bogus. This bug is an example of an incorrect assumption which is easily made and often difficult to detect. The assumption is that when a hardware device is asked to perform some action whose completion is signaled by an interrupt, it will always take a certain non-negligible amount of time before the interrupt arrives.
The BusLogic driver in NetBSD 1.0 submits a SCSI controller command and then sets up a 10-second timeout. The timeout is canceled when the command is completed, normally in an interrupt service routine. There is an obvious race—if the device signals an interrupt before the timeout is set up, it won’t be canceled and will incorrectly trigger after the timeout period elapsed.
Interestingly, the AHA-154x driver shipped with NetBSD 1.0 is extremely similar but does not have this problem. The architecture of the two HBAs is the same; the crucial difference is that the AHA-154x only has a 24-bit (16MB) address space whereas the BusLogic additionally supports 32-bit (4GB) addressing. The similarity of the hardware architecture naturally lends itself to very similar drivers.
The key difference is that the NetBSD 1.0 AHA-154x driver sets up the timeout before submitting a SCSI command, and additionally protects the command submission by raising the priority level via splbio()
. That way, the race condition is doubly prevented.
This class of problems is somewhat common with older operating systems and often shows up in virtualized environments. Virtualized devices tend to have extremely fast response time and incorrect assumptions about the time it takes for an interrupt to be processed will be exposed. However, physical systems can also trigger similar problems, only much less frequently. It is possible for the CPU to be held up by some external event—perhaps a higher priority interrupt, perhaps an SMI—with the same end result of an interrupt arriving “too fast”. Such races are then near-impossible to debug and fix.
How to get around this problem? Modifying the /usr/sys/arch/i386/isa/bt742a.c driver module would be easy, but it is difficult to do without installing the system first. If a real or correctly emulated BusLogic HBA is at hand, the AHA-154x boot floppy can be simply used instead. The driver will not be as efficient with more than 16MB RAM in the system, but it will do. As explained above, despite the high degree of similarity the AHA-154x driver does not suffer from the race condition.
NetBSD 1.1 includes a fixed BusLogic driver. The timeout setup is still performed after submitting a SCSI command, but the entire sequence is protected by raising the priority level. Thus the interrupt service routine can no longer be executed before the timeout is set up, even if the hardware signals an interrupt more or less immediately after a command is submitted.
Sharing much of the BusLogic-specific driver code, FreeBSD 1.0 and 2.0 suffer from the same bug. With FreeBSD 2.0, the workaround with using AHA-154x drivers is not viable, as the installation kernel is smart enough to probe for BusLogic HBAs first and only then try the Adaptec driver. FreeBSD 3.0 comes with a heavily rewritten and fixed driver.
11 Responses to BSD Buglets
I hope you will write an article on the history of BIOS memory detection calls.
Where the lack of leadership from IBM after the PC AT particularly shows.
There was no lack of leadership, but there was a strong lack of willingness to follow. That’s not the same thing.
Well the Compaq DeskPro 386 was released before the IBM PS/2. Did they support more than 16MB of RAM?
I highly doubt that… the technology just wasn’t there in 1986. EISA machines did support>16M RAM, but that was considerably later and there was a corresponding BIOS interface.
I don’t have anything planned currently but it’s a good idea. Quite a lot has changed in that area since the original IBM PC 🙂
One of those things still on my todo list for the 386BSD on bochs tutorial.[1] Installing the patch kits results in a lot of files with an unrealistic 1970 datestamp. Setting a previous century time with clock: time0= will probably do the trick as well.
[1]
http://gunkies.org/wiki/Installing_386BSD_on_BOCHS#TODO
Yeah, I suppose you could install with the time set to 1992-ish, patch the kernel, then use real time. Though I’m not sure 1992 timestamps would be any more realistic than 1970 🙂
Or you could build a patched kernel with a RTC year fix and use that for installing.
The added realism is only that you don’t create new files predating the release date and maintain some chronological order.
What happens is you freeze the date-time to say : 01-01-2015-12:00 ?
will one be able to build and will it be time-proof ?
>Virtualized devices tend to have extremely fast response time
Like how a VirtualBox VM reads floppy images essentially instantly, while it takes several seconds (or occasionally even minutes) to read a physical floppy?
This site uses Akismet to reduce spam. Learn how your comment data is processed.