Archives
- April 2025
- March 2025
- February 2025
- January 2025
- December 2024
- November 2024
- October 2024
- September 2024
- August 2024
- July 2024
- June 2024
- May 2024
- April 2024
- March 2024
- February 2024
- January 2024
- October 2023
- September 2023
- August 2023
- July 2023
- June 2023
- May 2023
- April 2023
- March 2023
- January 2023
- December 2022
- November 2022
- October 2022
- September 2022
- July 2022
- June 2022
- May 2022
- April 2022
- March 2022
- February 2022
- January 2022
- December 2021
- November 2021
- October 2021
- September 2021
- August 2021
- July 2021
- June 2021
- May 2021
- April 2021
- March 2021
- February 2021
- January 2021
- December 2020
- November 2020
- October 2020
- September 2020
- August 2020
- July 2020
- June 2020
- May 2020
- April 2020
- March 2020
- February 2020
- January 2020
- December 2019
- November 2019
- October 2019
- September 2019
- August 2019
- July 2019
- June 2019
- May 2019
- April 2019
- March 2019
- February 2019
- January 2019
- December 2018
- November 2018
- October 2018
- August 2018
- July 2018
- June 2018
- May 2018
- April 2018
- March 2018
- February 2018
- January 2018
- December 2017
- November 2017
- October 2017
- August 2017
- July 2017
- June 2017
- May 2017
- April 2017
- March 2017
- February 2017
- January 2017
- December 2016
- November 2016
- October 2016
- September 2016
- August 2016
- July 2016
- June 2016
- May 2016
- April 2016
- March 2016
- February 2016
- January 2016
- December 2015
- November 2015
- October 2015
- September 2015
- August 2015
- July 2015
- June 2015
- May 2015
- April 2015
- March 2015
- February 2015
- January 2015
- December 2014
- November 2014
- October 2014
- September 2014
- August 2014
- July 2014
- June 2014
- May 2014
- April 2014
- March 2014
- February 2014
- January 2014
- December 2013
- November 2013
- October 2013
- September 2013
- August 2013
- July 2013
- June 2013
- May 2013
- April 2013
- March 2013
- February 2013
- January 2013
- December 2012
- November 2012
- October 2012
- September 2012
- August 2012
- July 2012
- June 2012
- May 2012
- April 2012
- March 2012
- February 2012
- January 2012
- December 2011
- November 2011
- October 2011
- September 2011
- August 2011
- July 2011
- June 2011
- May 2011
- April 2011
- March 2011
- January 2011
- November 2010
- October 2010
- August 2010
- July 2010
GW-BASIC Source Notes
When I learned that Microsoft released the GW-BASIC source code, I was mildly curious to find out what is or isn’t there. The short answer is that there’s a whole lot, but a lot is also missing. Spelling note: Both “GW-BASIC” and “GW BASIC” can be found in the source code. The hyphenated spelling will be used here for consistency.
The first question is: When is the source code from? Microsoft marked the source files February 10, 1983, but that’s almost guaranteed to be wrong. The date comes from comments in the code: “This translation created 10-Feb-83 by Version 4.3”. That reflects running some sort of master BASIC source code through a translator generating 8086 code. The source code was almost certainly modified after that date.
My current best guess is that the source code is roughly from mid-1983. But that’s only a guess.
Assembling the Source
The next order of business was figuring out how to assemble the source code. The Microsoft source release provides absolutely no clues on this front. There is no makefile (although perhaps it’s too old for one), no batch file, no build notes, nothing.
The GW-BASIC source code makes several mentions of Intel’s ASM86, but the source uses far too many MASM specifics. It is likely that some older version used ASM86, but not the released source.
Armed with a collection of MASM versions, I tried assembling the source. It did not go well. Nothing could be assembled. MASM 5.1 seemed to get the furthest, which was odd because it’s really far too new (1988); moreover, MASM 5.1 has a built in INSTR operator which clashes with an INSTR symbol in the GW-BASIC source code.
It turned out that MASM 5.1 was merely more tolerant of UNIX line endings. Old MASM versions require DOS style (CR/LF) line endings and get very upset otherwise, spitting out confusing errors.
After massaging the source files to make them more palatable to MASM, things got more interesting. Long story short, almost all the files can be assembled with Microsoft MASM 1.00 or 1.10, as well as IBM MASM 1.0. There are known problems with the very old MASM versions that can be avoided by reducing conventional memory size to 512 KB.
Most files cannot be assembled with Microsoft MASM 1.12 or later, or IBM MASM 2.0. The problem is generally better diagnostics in newer MASM versions which refuse questionable constructs in the GW-BASIC source code.
These are the kinds of statements that MASM 1.12 and later refuses:
MOV DX, OFFSET 256*100+OPCNT
MOVS ?CSLAB,WORD PTR ?CSLAB
The exception is the GWMAIN module. MASM 1.x versions fail to assemble it because they run out of memory. The module can be successfully assembled with IBM MASM 2.0 or Microsoft MASM 3.0. No amount of pleading convinced MASM 1.x to work.
This raises some question marks. IBM MASM 2.0/MS MASM 3.0 are really too new (1984) for the GW-BASIC source code. It is possible that Microsoft used development versions of MASM; it is known (see page 337) that Microsoft shipped the bulk of GW-BASIC to OEMs in object code form and OEMs needed to supply glue code required for GW-BASIC to interface with their platform. It is thus possible that the code could not be actually assembled with a generally available off-the-shelf tool.
There is also some possibility that Microsoft did use MASM 1.0 or 1.1 but not hosted on DOS. At any rate, IBM MASM 1.0 plus IBM MASM 2.0 can be used to assemble the source code, and so can Microsoft MASM 1.10 plus MASM 3.0.
There was also an easily resolved mystery related to the GW-BASIC math package. There are two source files, MATH1.ASM and MATH2.ASM. Neither can be assembled. But if they are merged together, e.g. by including both from a master source file, assembly succeeds. The MATH module may have been split because the source code is almost 180KB and certainly would not fit on a 160KB floppy.
Update: Shortly after writing the above, I hit paydirt. MASM 1.06, ostensibly from 1982, can cleanly assemble all of the GW-BASIC source files, with no syntax errors and no running out of memory. A copy can be found here (as MACRO86.EXE) and here; the two executables have different date stamps but are in fact bit for bit identical. Why both older and newer MASM versions run out of memory on GWMAIN.ASM remains a mystery for now, but we now know that there was at least one MASM version that could assemble everything on a PC.
Comparing with a Binary
The next todo item was finding a GW-BASIC binary that’s close to the released source code. It quickly turned out that most GW-BASIC binaries are either older or newer. The right ones show
(C) Copyright Microsoft 1982
but may display various version numbers. They may or may not mention GW-BASIC. In the end I zeroed in on two binaries. One was GWBASIC.EXE dated Nov 11, 1983, file size 56,832 bytes, showing the following:
EAGLE GWBASIC Version 1.20 11/11/83
(C) Copyright Microsoft 1982
The other was BASICA.EXE dated May 13, 1983, file size 54,272 bytes. The sign-on message was:
The COMPAQ Personal Computer BASIC
Version 1.13
(C) Copyright COMPAQ Computer Corp. 1983
(C) Copyright Microsoft 1982
Both of these are a very good but not perfect match for the released source code. I am almost certain that the Compaq version is slightly older than source code (because there are a few bits missing), while the Eagle version is slightly newer (because there are a few extra bits). That implies the released source code is older than November 1983 but possibly newer than May ’83.
Mapping Out the Binary
I concentrated on the Eagle Computers GWBASIC.EXE since it seemed to be a slightly better match for the source code. I was able to match all of the source code with the binary and arrived at the following sequence of source modules (note that BI stands for BASIC Interpreter):
GWDATA.ASM GWMAIN.ASM OEM.ASM GWEVAL.ASM GWLIST.ASM IBMRES.ASM BIMISC.ASM DSKCOM.ASM BIPTRG.ASM BIPRTU.ASM BISTRS.ASM FIVEO.ASM GENGRP.ASM ADVGRP.ASM MACLNG.ASM GWSTS.ASM GIO86.ASM GIODSK.ASM GIOKYB.ASM GIOSCN.ASM GIOLPT.ASM GIOCOM.ASM GIOCON.ASM GIOTBL.ASM SCNEDT.ASM SCNDRV.ASM CALL86.ASM NEXT86.ASM MATH.ASM (MATH1.ASM + MATH2.ASM) KANJ86.ASM GIOCAS.ASM ITSA86.ASM GWRAM.ASM GWINIT.ASM BIBOOT.ASM
OEM.ASM is a hypothesized OEM-supplied module which is not part of the GW-BASIC source code distribution. It is not a trivial piece of code and accounts for over 6,000 bytes of object code in the Eagle GWBASIC.EXE (more than 10% of the total).
It is likely that other GW-BASIC implementations order the modules differently, although the order of some of the modules at the beginning and end may be fixed (for example GWDATA.ASM needs to be first).
Code Commentary
Reading the source code is fascinating. The code has clearly long history:
--------- ---- -- ---- ----- --- ---- ----- COPYRIGHT 1975 BY BILL GATES AND PAUL ALLEN --------- ---- -- ---- ----- --- ---- ----- ORIGINALLY WRITTEN ON THE PDP-10 FROM FEBRUARY 9 TO APRIL 9 1975 BILL GATES WROTE A LOT OF STUFF. PAUL ALLEN WROTE A LOT OF OTHER STUFF AND FAST CODE. MONTE DAVIDOFF WROTE THE MATH PACKAGE (F4I.MAC).
Paul Allen was clearly involved for a while:
FIVEO 5.0 Features -WHILE/WEND, CALL, CHAIN, WRITE /P. Allen
There is no indication that Bill Gates or Paul Allen were involved by the time the product became GW-BASIC.
The source code is written, as it was then common, in ALL CAPS (although not completely).
One of the most jarring things is that, as it was also common in the bad old days, identifiers are limited to six characters. That leads to ugly, cramped, and hard to decipher identifiers like FRMQNT or SKPMRF or LEVFRE or XCESDS. The 6-character limitation is also applied to file names.
The code is generally quite unstructured and very hard to follow. The PROC keyword is not used at all. Procedures are used, but rather loosely. Code very frequently jumps into the middle of another routine or returns from a routine by using a JMP rather than RET. As a consequence, there are only minimal attempts to keep values in registers and almost all data is kept in memory. The jumpy programming style also makes it impossible to use local variables on the stack. No doubt the code is written that way because it was originally targeting the Intel 8080.
The code contains a nice collection of “what not to do” Intel recommendations. To be fair, those recommendations don’t really apply to the 8086. The style violations include mixing of code and data and jumping into the middle of an instruction.
For example, calls to the SYNCHR routine are followed by one byte of data (excerpt from FIVEO.ASM):
CALL SYNCHR DB OFFSET 54O ;Must be comma CMP AL,LOW 54O ;Ommit line # (Use ALL for instance)
The byte is not code, it is data. SYNCHR pops the return address off the stack, processes the data and increments the address, then pushes it back.
The other type of abuse is even more interesting (excerpt from GWMAIN.ASM):
PUBLIC SNERR SNERR: MOV DL,LOW OFFSET ERRSN ;"SYNTAX ERROR" DB 271O ; SKIP ;"LXI B," OVER THE NEXT 2 PUBLIC DV0ERR DV0ERR: MOV DL,LOW OFFSET ERRDV0 ;DIVISION BY ZERO DB 271O ; SKIP ;"LXI B," OVER THE NEXT 2 PUBLIC NFERR NFERR: MOV DL,LOW OFFSET ERRNF ;"NEXT WITHOUT FOR" ERROR DB 271O ; SKIP ;"LXI B," OVER THE NEXT TWO BYTES
Note that LXI is an 8080 instruction, clearly revealing where the idea had come from. When the caller jumps to one of the labels, it will execute a MOV DL followed by a sequence of MOV CX instructions. The CX value is ignored and only the contents of DL is used.
Both of these techniques make disassembly somewhat difficult and confusing, although only very slightly so when one is armed with the source code.
Memory Management
Understanding how GW-BASIC manages memory takes a bit of effort. As was common and necessary in the old days, GW-BASIC discards initialization code and uses the recovered memory for other purposes. The label CSEND indicates the end of resident code with the following comment: “All code loaded after this label is resident only until routine MAPINI initializes the new memory map.”
It should be noted that GW-BASIC effectively uses the small memory model. The CS segment register points to code and DS/ES/SS all have the same value pointing to the data segment. The data segment size is variable and depends on the available memory (but can’t be more than 64K). There is no attempt at exploiting the segmented nature of the 8086 architecture; that makes sense given the 8-bit heritage and the fact that early PCs did not have all that much RAM in the first place.
Within the BASIC data segment, memory is subdivided into several areas. The basic layout is documented in the file GWINIT.ASM (see comment “Memory map for GW-BASIC”). There is stack overflow checking which is invoked for all larger memory allocations; as mentioned above, GW-BASIC does not use local stack variables, which means its stack usage is otherwise very minimal.
Further Directions
It would be handy to find an existing GW-BASIC executable which is an exact match for the released source code. So far I’ve not been successful and in fact the vast majority of Microsoft BASIC interpreters are either older (BASIC 5.x) or newer (GW-BASIC 3.x) versions.
It should also be possible to reverse engineer/disassemble/reconstruct the missing OEM source module (or modules) required to produce a complete GW-BASIC executable. That is likely to be a fair amount of work.
20 Responses to GW-BASIC Source Notes
Another possibility is that Microsoft built it using MASM running MS-DOS on a S-100 system which didn’t have the PC 640KB limit. I remember reading that prior to DOS extenders Microsoft used a S-100 system to link the linker because so much memory was required to do so.
No, that does not make sense. MASM makes no attempt to use all available conventional memory. Also, MASM 1.06 has no trouble assembling the GWMAIN module with ~130K free conventional memory, (it seems to need a bit over 120K free).
>It should also be possible to reverse engineer/disassemble/reconstruct the missing
>OEM source module (or modules) required to produce a complete GW-BASIC
>executable. That is likely to be a fair amount of work.
Nevertheless, it’s already (mostly) done. See https://github.com/tkchia/GW-BASIC
Sure, if you just want to steal the code from an existing binary, it’s not that hard. In fact I’m surprised that part isn’t complete yet.
About Gates vs. Allen in “a lot of stuff”, the original 8080 4K source said:
00560 PAUL ALLEN WROTE THE NON-RUNTIME STUFF.
00580 BILL GATES WROTE THE RUNTIME STUFF.
00600 MONTE DAVIDOFF WROTE THE MATH PACKAGE.
In a web comment that MAY be from the horse’s mouth, that commenter says “When it says Paul Allen wrote the non-runtime stuff that means the development environment which was an amazing piece of work he did on the PDP-10 that made development work very productive including simulation and symbolic debugging.”
The comment comes from a discussion about the easter egg in Commodore’s 6502 Basic at https://www.pagetable.com/?p=43. Pagetable also goes into the original 6502 source and the MACRO-10 language its written in: https://www.pagetable.com/?p=774
Excellent blog post, as usual. Thanks for the archeology — I was curious myself which version of GW-BASIC this source drop was supposed to be for. (I’ve also written a blog post about GW-BASIC this weekend, where I describe my ongoing effort to port it back to the Z80: https://tia.mat.br/posts/2020/06/21/converting-gwbasic-to-z80.html — so any missing puzzle pieces, like this blog post, are appreciated.)
I’m not even sure about the version number. My impression is that it’s “the first GW-BASIC”, which may have been called GW-BASIC 1.0 except IBM and Compaq didn’t. It’s definitely newer than MS BASIC 5.x and a superset of it. The only clue in the code is this:
FIVEO=1 ;GENERATE VERSION WITH RELEASE 5.0 FEATURES
GWLEV2=0 ;Version 2.0 of GW BASIC-86
GWLEV2=0 ;GW BASIC version 2.0 features
(The GWLEV2 define can be found in two different files with the different comments.)
I take that to mean that it’s not GW-BASIC 2.0. Note that the above defines are not referenced anywhere in the source code, they appear to have been set for the mysterious translator which produced the 8086 source code.
From my research GW-BASIC 2.0 should have DOS directory support, and the released source does not — CHDIR, RMDIR, etc. is there but stubbed out.
I would think the OEM layer also had specific graphics routines as well. The Canon AS-100 computer was an 8086 non-IBM compatible computer that had some neat graphics for its day.
It came out with MS-DOS 1.1 so I think it was GWBASIC 1. The manual was done by Canon for A size pages and tries to as helpful as possible, Ir was not as cool as the IBM Documentation.
http://www.minuszerodegrees.net/manuals/Canon/Canon%20AS-100%20-%20GW-BASIC%20Users%20Guide.pdf
The core graphics logic was all generic, but yes, OEMs of course needed to supply code to set graphics modes, draw pixels, and the like.
I seem to have Compaq BASIC version 1.14, if you want to have a look. Its from Compaq MS-DOS v1.12g. File is 54304 bytes and it is dated November 28th, 1983.
Definitely worth checking out. The older Compaq BASICA.EXE I’ve been looking at is in some areas disturbingly different from the published source. How can I get hold of the newer BASIC executable?
Cool, I wasn’t aware that some of the older 8-bit MS BASIC source code was out there. It’s definitely closely related.
The Compaq Personal Computer DOS 1.12 is now available at archive.org:
https://archive.org/details/compaq-dos-1.12g
Have fun 🙂
Thanks! At first glance I’m skeptical, because it says “(C) Copyright Microsoft 1982, 1983″… but I will take a closer look.
I got these versions in my files,
if you need something just drop me an email, thanks for interesting reading.
basic compiler-5.31(ibm 1.00).rar
basic compiler-5.36.rar
basic compiler-5.60(ibm 2.00).rar
basic-5.21.rar
basic-5.27.rar
basic-5.28.rar
basica-1.00.rar
basica-1.10.rar
basica-1.13-compaq.rar
basica-1.14-compaq.rar
basica-2.00.rar
basica-2.10.rar
basica-3.0-mitsubishi.rar
basica-3.00.rar
basica-3.10.rar
basica-3.21.rar
basica-3.30.rar
basica-3.31-compaq.rar
basica-3.31.rar
basica-3.40.rar
basica-4.00.rar
basica.rar
gwbasic-1.12.03(corona).rar
gwbasic-2.00-olivetti.rar
gwbasic-2.01-olivetti.rar
gwbasic-2.01-televideo.rar
gwbasic-2.02-bondwell.rar
gwbasic-2.02-commodore.rar
gwbasic-2.02-tandy.rar
gwbasic-2.02.rar
gwbasic-3.11-apricot.rar
gwbasic-3.16-olivetti.rar
gwbasic-3.20-hyundai.rar
gwbasic-3.20-monocrome graphics (mbasic).rar
gwbasic-3.20-tandy.rar
gwbasic-3.20.rar
gwbasic-3.21-ibm.rar
gwbasic-3.22(spanish).rar
gwbasic-3.22.rar
gwbasic-3.23.rar
gwbasic-3.27-olivetti.rar
msdos basic-2.00.rar
Leandro Pereira, I’m afraid that, while educational, your effort may be wasted. If you want a 8080/Z80 version, the source to Microsoft Basic-80 5.2 has already been leaked and is not too hard to find.
I think the released GW-BASIC would have been an archived copy of the final MS-DOS 1.x compatible code base. March 83 has the release of the XT and the DOS 2 compatible BASIC which would by followed by a GW-BASIC offering to match. The BASIC code was too large to be kept universal; GW-BASIC needed multiple code segments while MSX BASIC needed to be able to page split ROMs.
This form of BASIC was nearing its end. GW-BASIC had only a few minor updates (Tandy sound and EGA) after DOS 2. Coco BASIC updates were done by Microware. The only other major revision of MS classic BASIC after 1983 was the off shoot of Handheld BASIC f0r 8088 portables which implemented slightly fewer features to fit in a very tight ROM budget. The GW-BASIC code would have needed extensive redesign to handle the concept of sub-functions as introduced in DOS 3.
Based on what I see with the disassemblies, the source code was not the final GW-BASIC 1.x. In fact most of the GW-BASIC 1.x binaries I looked at (Compaq 2x, Corona, Eagle) are newer than the published source (there’s slightly more code/functionality). In fact the oldest(?) of those executables, Compaq BASICA.EXE version 1.12, appears to be the closest match in several areas.
Hello dear os2museum,
I tried to assemble advgrp.asm with MACRO86.EXE (MASM 1.06) and it didn’t succeeded !
Must I specify a special or how does it work ?
Best greets,
Martin
I don’t know, maybe it simply doesn’t work. Microsoft MASM 1.00 and 1.10 is known to work, so is IBM MASM 1.0; newer MASM versions generally won’t work. If you want to use something else and it’s not working, it’s up to you to figure out why. If you don’t want to figure it out, use one of the three MASM versions known to work.
This site uses Akismet to reduce spam. Learn how your comment data is processed.