Overview
If you have any questions, see the MikeOS website
for contact details and mailing list information.
Click the links on the left to navigate around this
guide.
In the early 1960s a computer programmer named Charles Moore developed what he considered to be a 4th generation programming language (4GL), Forth. Forth can be considered to be an application, compiler, language or operating system depending upon how the program is being used at any particular time.
A Forth system consists of a dictionary of words that can be compiled or executed depending on the machine's current operating state. The dictionary is further divided into vocabularies. The same word may be present in more than one vocabulary and perform differently in each one. The machine's current context determines which definition is used. The primary vocabulary where most words are linked is FORTH.
Forth is modeled as a two stack virtual machine: one stack for passing parameters and one to follow the execution of a program (list of pointers to currently executing 'words'). It is tempting to think of each word as a computer subroutine and a program as a collection of subroutines, however, Forth is defined as a direct threaded language. Languages that are subroutine threaded may be considered to be Forth-like, but not true Forth.
Over the years several standards have been developed that describe the meaning of the words that comprise the kernel of a Forth system. This document will rely heavily on three of these standards: Forth-79, Forth Interest Group (FIG) and Forth-83. In 1994 a national standards group developed an all-encompassing standards document; it will be referenced occasionally. The standards are very similar, but there are significant differences, in spots, among them. The Forth-79 standard will be used preferentially to resolve most conflicts.
Forth has the reputation of being 'write-once' code, i.e. it is not possible to understand, change and debug an existing program. This is the original programmer's problem and not inherent in the language. If the code is laid out well and documented using the built-in tools, it will be no harder to change than any other well-written code.
This document is a brief description of a Forth system that has been developed to work with MikeOS and is not meant to 'stand alone.' Much information concerning Forth is available on line. Two books that are very helpful are Starting Forth by Leo Brodie (a tutorial that takes one from basic concepts to advanced use of the language) and FORTH: A Text and Reference by Mahlon Kelly and Nicholas Spies (a text book in a tutorial style with an exceptional glossary of most common Forth words).
This version of Forth is set up to use 7-bit ASCII characters for input and output.
Although strings are input as null terminated (ASCII-Z), strings are stored in the dictionary as 'counted:' one byte count followed by the (counted) number of characters. The maximum size counted string is 255 characters. We further limit this to 32 characters for a name to keep the word list from becoming unwieldy.
Logic values: false is numerically equal to zero. Any non-zero value tests as true. Logic functions return 1 for true in Forth-79.
Forth uses post-fix (sometimes called 'reverse Polish')
notation, e.g. 2 3 * = 6. Note that the operation
comes after the variables, not before or between.
The primary vocabulary is called FORTH. This is usually where a program is developed, however other vocabularies, including custom ones, are possible. Other common vocabularies are ASSEMBLER and EDITOR. Because the host OS normally handles these functions they are not included in this implementation.
[Custom] Forth usually divides a disk into logical blocks of 1k bytes; except at the driver level, the hardware disk structure is ignored. This implementation uses the host built-in file functions to access information from the host file system rather than read/write raw disk data.
[Custom] For faster searching during compilation the dictionary is split into 16 hash chains. The current vocabulary modifies the hash function so that only words in the vocabulary(ies) of interest are found. Typically, a dictionary search starts in the current vocabulary and continues in the FORTH vocabulary if the word is not found in the current one.
Traditionally, Forth words are documented by showing the before and after parameter stack condition inside a comment. In-line comments start with '(' and end with ')'; before and after are separated by two dashes. E.g. addition could be documented as '( n1 n2 -- n3 )'. 'n' represents a signed 16-bit value, 'd' a 32-bit value, 'c' an 8-bit character or byte value, 'u' unsigned value, 'a' a 16-bit address and 'f' a logic flag. Additional information may be placed after the final stack value to describe the operation in more detail. The format of the additional information is highly dependent on the actual programmer. Consult the recommended references and the included source code for details of the behavior of individual words.
Note: the author's grouping of words is somewhat different than either the standards or the text books mentioned above.
*
Forth-79
'times'
*/
Forth-79
'times divide'
*/MOD
Forth-79
(star slant) 'times divide mod'
+
Forth-79
plus
+!
Forth-79
'plus store'
-
Forth-79
minus
/
Forth-79
'divide'
/MOD
Forth-79
(slant) 'divide mod'
1+
Forth-79
1+!
79 uncontrolled
Inc memory contents
1-
Forth-79
1-!
79 uncontrolled
Dec memory contents
2*
79 reserved
'two times'
2**
Custom
2^n
2+
Forth-79
2-
Forth-79
2/
Forth-83
4*
Custom
4/
Custom
@+
Custom
( n1 a -- n )
@-
Custom
( n1 a -- n )
ABS
Forth-79
(absolute)
AND
Forth-79
C+!
Custom
( c a -- )
COM
79 uncontrolled
One's complement
D+
Forth-79
D+!
Custom
( d a -- )
D-
79 double
DABS
79 double
'd-abs'
DMAX
79 double
DMIN
79 double
DNEGAGE
Forth-79
Two's complement
LSHIFT
Forth-94
( n c -- n<<c )
M*
Forth-94
( n1 n2 -- d )
M*/
94 double
( d1 n1 n2 -- d )
M+
94 double
( d1 n -- d )
M/
Custom
( d n1 -- n )
MAX
Forth-79
MIN
Forth-79
MOD
Forth-79
NEGATE
Forth-79
Two's complement
OR
Forth-79
RSHIFT
Forth-94
( n c -- n>>c )
S>D
Forth-94
's to d' ( n -- d ) sign extend
SQRT
Custom
( d -- n )
T*
Custom
( d n -- t )
T/
Custom
( t n -- d )
U*
Forth-79
U/MOD
Forth-79
XOR
Forth-79
Use U* and U/MOD in place of Forth-83 UM* and UM/MOD.
!
Forth-79
'store'
+@
Custom
( a1 a2 -- n ) 'a' may be offset
-LEADING
Custom
( addr cnt -- addr' cnt' )
-ROT
Custom
( n1 n2 n3 -- n3 n1 n2 )
-TEXT
79 uncontrolled
( a1 n a2 -- f,t=different )
-TRAILING
Forth-79
2!
79 double
'two store'
2@
79 double
'two fetch'
2>R
Forth-94
'two to r'
2DROP
79 double
'two drop'
2DUP
79 double
'two dup' (duplicate)
2OVER
79 double
'two over'
2R>
Forth-94
'two r from'
2SWAP
79 double
'two swap'
3DUP
Custom
( n1 n2 n3 -- n1 n2 n3 n1 n2 n3 )
<CMOVE
79 reserved
(backwards) 'reverse c-move'
><
79 uncontrolled
'interchange bytes'
>R
Forth-79
'to r'
?DUP
Forth-79
'question dup'
@
Forth-79
'fetch'
BLANK
79 reserved
C!
Forth-79
(byte) 'c-store'
C@
Forth-79
'c-fetch'
CMOVE
Forth-79
'c-move'
COUNT
Forth-79
DEPTH
Forth-79
Parameter stack depth
DROP
Forth-79
DUP
Forth-79
(duplicate)
ERASE
Forth-79
FILL
Forth-79
L!
Custom
( n seg off -- ) long, intersegment
L@
Custom
( seg off -- n )
LC!
Custom
( c seg off -- )
LC@
Custom
( seg off -- c >> zero extended byte )
LOWER>UPPER
Custom
( c -- c' )
OVER
Forth-79
PICK
Forth-79
R>
Forth-79
'r from'
ROLL
Forth-79
ROT
Forth-79
(rotate)
S0
79 uncontrolled
Report TOS
SEGMOVE
Custom
( fs fa ts ta #byte -- )
SP@
79 reserved
SWAP
Forth-79
XFER
Custom
( a1 a2 -- >> transfers contents of 1 to 2 )
2ROT, MOVE (use CMOVE) and R@ (use I) are not included in this implementation.
#
Forth-79
'sharp'
#>
Forth-79
End number conversion
#S
Forth-79
Convert numbers
#TIB
Forth-83
System variable => characters left in current input stream
$
Custom
Temporary (next number input only) base 16
%
Custom
Temporary base 2
(D.)
Custom
Format a signed double
.
Forth-79
'dot'
.BASE
Custom
'dot base' = BASE @ .
.R
79 reserved
Right-justified number
.S
Forth-94
Show parameter stack
0
Custom
System constant for speed and size
0.
Custom
System constant
1
Custom
System constant
1.
Custom
System constant
2
Custom
System constant
<#
Forth-79
Begin number conversion
>IN
Forth-79
System variable => offset into input stream
?
Forth-79
'question' = @ .
BASE
Forth-79
System variable
BASE!
Custom
'base store'
BINARY
Custom
BELL
79 uncontrolled
BL
79 reserved
System constant = space = 32
CONVERT
Forth-79
CR
Forth-79
D.
79 double
'd-dot'
D.R
79 double
'd dot r'
DECIMAL
Forth-79
EMIT
Forth-79
EXPECT
Forth-79
HEX
79 reserved
HLD
Custom
System variable; address for HOLD
HOLD
Forth-79
KEY
Forth-79
NUMBER
79 reserved
Counted string to double
OCTAL
79 reserved
OK
Custom
Say 'ok'
Q
Custom
Temporary base 8
SIGN
Forth-79
SPACE
Forth-79
SPACES
Forth-79
SPAN
Forth-83
System variable
TIB
Forth-83
Returns address of input buffer (text or disk)
TYPE
Forth-79
U.
Forth-79
(unsigned) 'u dot'
U.R
79 reserved
'u dot r'
IF ... ELSE ... THEN
C@SWITCH ... ENDSWITCH
SWITCH ... ENDSWITCH
0<
Forth-79
'zero less than'
0=
Forth-79
'zero equal'
0>
Forth-79
'zero greater than'
<
Forth-79
=
Forth-79
'equal'
>
Forth-79
?CELL
Custom
( n -- n f,t=word )
?PRINTABLE
Custom
( c -- f,t=printable )
D0=
79 double
D<
Forth-79
D=
79 double
DU<
79 double
'd u less than'
FALSE
Forth-94
( -- 0 )
FALSE!
Custom
( a -- >> stores 0 in address )
NOT
Forth-79
Alias for 0=
STAY
Custom
( f -- >> exit if false )
TRUE
Forth-94
( -- t )
U<
Forth-79
WITHIN
Forth-94
( n n2 n3 -- f >> true if n2 <= n < n3 )
{
Custom
Start option compile
}
Custom
End option compile
NOT may also be used for 0= .
BEGIN ... AGAIN [Custom] Only an abort terminates the loop.
BEGIN ... UNTIL
BEGIN ... WHILE ... REPEAT
DO ... LOOP
DO ... +LOOP
DO ... /LOOP [Custom] Unsigned limit test.
2LEAVE-EXIT
Custom
Leave 2 loops and exit word
I
Forth-79
'eye'
I'
79 reserved
'I-prime'
J
Forth-79
'jay'
J'
Custom
'j-prime'
K
79 reserved
'kay'
LEAVE
Forth-79
LEAVE-EXIT
Custom
Leave loop and exit word
'
Modified 79
'tick'; returns CFA, state smart
(
Forth-79
Start comment
)
Not actual word
End comment
,
Forth-79
'comma'
."
Modified 79
'dot quote', state smart
2CONSTANT
79 double
2VARIABLE
79 double
:
Forth-79
;
Forth-79
;CODE
79 assembler
;code
Forth-94
Run-time header for development
>BODY
Forth-83
CFA → PFA
ABORT"
Forth-83
'abort quote', state smart
ALLOT
Forth-79
ARRAY
Custom
Array of bytes
ASSEMBLER
79 assembler
vocabulary
C,
79 reserved
'c comma' (compile)
CFA
FIG
PFA → CFA
CVARIABLE
Custom
Byte variable
COMPILE
Forth-79
CONSTANT
Forth-79
CONTEXT
Forth-79
System [double] variable
CREATE
Forth-79
CURRENT
Forth-79
System [double] variable
DCLIT
Custom
( c1 c2 -- )
DEFINITIONS
Forth-79
DOES>
Forth-79
'does'
EDITOR
79 reserved
vocabulary
EMPTY
Custom
Go back to last protected dictionary
FENCE
Custom
System variable
FORGET
Forth-79
FORTH
Forth-79
H
Custom
System variable contains 'here'
H-LIST
Custom
Print dictionary hash chain
HEADS
Custom
System array of hash pointers
HERE
Forth-79
ID.
Custom
( lfa -- >> prints name of word at link addr )
IMMEDIATE
Forth-79
L>CFA
Custom
LFA → CFA
L>NFA
Custom
LFA → NFA
L>PFA
Custom
LFA → PFA
LAST
79 uncontrolled
System variable = last word created
LITERAL
Forth-79
PAD
Forth-79
PROTECT
Custom
SMUDGE
FIG
STATE
Forth-79
System variable
UNSMUDGE
FIG
VARIABLE
Forth-79
VLIST
FIG
(vocabulary) 'v list'
VOCABULARY
Forth-79
[
Forth-79
'left
bracket' stop compiling
]
Forth-79
'right
bracket' restart compiling
Since ' and ." are state smart, ['] and .( are not needed.
VLIST replaces Forth-83 WORDS.
'ABORT
Custom
Vectored abort address
@EXECUTE
Custom
For vectored execute
ABORT
Forth-79
EXECUTE
Forth-79
EXIT
Forth-79
INTERPRET
79 uncontrolled
QUERY
Forth-79
QUIT
Forth-79
WORD
Forth-79
!CURSOR
Custom
(set) 'store cursor'
.AZ
Custom
Print a null terminated string (ASCII-Z)
/0
Custom
Divide 0 interrupt
?MEM
Custom
Amount of memory left for new dictionary entries
@CURSOR
Custom
(get) 'fetch cursor'
ASCII
79 uncontrolled
Numerical value of next word; state smart
CLS
Custom
Clear screen
FNAME
Custom
System byte array: the file name used for file access
FORTHSEG
Custom
( -- seg ), Intel segment system currently resides in
FIRSTSEG
Custom
( -- seg ), first available full segment
ROWS
Custom
Rows available in display
SYSTEM
Custom
Return to host system
VERSION
Custom
Print version string
\
Custom
Comment to end of line
Forth is a very 'lean' system. There is no terminal prompt; only the default blinking cursor. When starting the system the version string will be displayed, followed by the command completed satisfactorily message (ok).
Start the system and press the <Enter> key a couple of times. Each time the system should say 'ok' and the cursor move to the next line for more input. To return to the host system at any time type SYSTEM<Enter>.
The system can do much in its interpretive mode. Try (pay particular attention to the spaces between each 'word':
5 2 + .<Enter>
Remember, Forth uses post-fix notation. The system should have responded with '7 ok'. The 'dot' tells Forth to print the top number on the parameter stack.
Now, let's create the traditional first program; type:
: HW ." Hello World!" ;<Enter>
Execute the program (word) by typing HW<Enter>. ':' creates a new dictionary entry. 'HW' is the name we gave this word that was compiled into the Forth vocabulary; you could have used any other legal name. ' ." ' compiles a literal string that will be displayed when the word is executed. Finally, ';' completes the dictionary entry and makes it findable.
Here's another short 'program' to try.
: LP 5 0 DO I . LOOP ; <Enter>
Execute the word by typing LP<Enter>. Did the system respond with '0 1 2 3 4 ok'?
Alternately you could use : LP [ 5 0 ] DCLIT DO I . LOOP ; <Enter>
Forth uses a 2 stack virtual machine model. 'I' retrieves the working loop counter of the outermost loop and places it on the parameter stack and 'dot' prints out a signed number. Loop counters and limits are stored on the return stack so that the parameter stack can still be easily accessed inside of loops.
DCLIT requires that the 2 literals (constants) each fit into a signed byte. The '[' stops compilation and allows the numbers to be placed on the parameter stack. The ']' resumes compilation and the definition completes as before. This construct is a little quicker and saves a little space in the dictionary.
Experiment and enjoy.
Register usage (generally):
The system follows the Forth-79 standard as much as possible. Although it is permissible to specify the system as, "FORTH-79 Standard Subset," the author has chosen not to do so.
Bytes have a numeric range of -128 to +127. Cells (16-bit words), -32768 to +32767. And double words (32 bits), -2,147,483,648 to +2,147,483,647.
Fixed point math can be remarkably precise. E.g. 355 113 */ is excellent approximation of multiplying by π.
Forth words may perform 2 different operations: one during word compilation and one during execution. In the "Getting Started" section it was seen how ' ." ' compiled a literal string into the dictionary during compilation and then printed that string during execution.
This system uses the following dictionary entry format:
LFA = Link Field Address: a pointer to the previous definition in this hash list
NFA = Name Field Address: a counted string that represents the name and flags. The maximum number of characters in a name is 31 (bits 4 to 0). Bit 5 is used to 'smudge' an entry so that later definitions can replace early ones. Bit 6 is reserved. And bit 7 indicates immediate execution rather than compilation. The name field may be 2 to 32 bytes long.
CFA = Code Field Address: pointer to actual code to execute. For assembler (code) definitions this is usually the beginning of the next cell. (The mathematical operators at the start of the dictionary are typical.) For colon definitions this points to the colon run-time code.
PFA = Parameter Field Address: the parameters needed by this definition. For code definitions this would be machine code. For colon definitions this would be a list of CFAs that make up this word's definition (description). For constants and variables this would be actual data.
When using an external compiler the link and name fields may be omitted (entry is considered headerless) to conserve space. Words using these headerless entries will still execute, but the name will not be found for use in future definitions. Alternatively, to save space, the headers may be placed in a separate data segment during compilation: only the final word needs to be able to be found to execute an intricate program.
In the original specification of Forth higher level definitions consisted of either code or colon types. There are Forth-like systems that provide for inline code in colon definitions. This construct has rarely been advantageous in a true Forth system; the increased size and complexity out weighs any speed gains.
CONTEXT and CURRENT are double variables (32-bits) that contain lists of VOCABULARIES. A vocabulary is designated by a nibble, 1-15, with null being 'none.' A 32-bit variable has 8 nibbles and thus may designate up to 8 vocabularies. The dictionary hash function uses the vocabulary designator and the 7-bit ASCII value of the first letter of the word to reduce the search to only one of the 16 chains. The least significant nibble of CURRENT specifies where new words are to be compiled. CONTEXT specifies which vocabularies and in what order they are to be searched for words making up the current word being compiled.
This implementation follows the extended FIG model for preventing inadvertent tampering with the kernel. There are two arrays that describe the dictionary. One, GOLDEN, contains the variables that map the protected portion of the dictionary. The second array, HEADS, contains the variables that map the working dictionary. The dictionary may be returned to its golden state by using the word EMPTY. Alternately, the FENCE can be moved to the current working position with the word PROTECT.
This implementation was written to take advantage of many of the input and output functions available in an IBM compatible BIOS. All keyboard entry and screen output, specifically, goes through the BIOS. The main OS (PC DOS or MikeOS) is used to gain access to operating files on the disk.
Two of the most basic and important operations in the kernel are separating a word from the input stream and finding a word in the dictionary. Many other operations cannot be completed unless these two function properly. To separate a word the system needs a delimiter. The most common delimiter is space (or BL). Generally, with the input stream pointer set, 'BL WORD' will separate the next word from the stream and transfer the string to HERE + 2. This sets it up to compile a new word -- link field goes at HERE -- or place a literal string in the dictionary -- CFA of defining word goes HERE. Although there are standards, many programmers (as does the author) prefer non-standard stack conditions for dictionary searches. In this implementation 'FIND' is left headerless to prevent confusion.
Most macro processors would have difficulty building the dictionary hash lists during the assembly of the code. This system uses a slightly different approach: the NASM macro processor builds one long chain during the assembly, then the Forth start-up code splits the single chain into the desired hash lists. This can take a significant amount of time on an 8-bit, 1 MHz microprocessor, but is not noticeable on modern processors. All words in the initial code must be in the FORTH vocabulary. The start up penalty can be saved by saving the after start-up modified code as an appropriate '.bin' or '.com' executable. The Forth word 'write_exec <file-name>' will do this for the user. Note that 'write_exec' is one of the few Forth words in lower case; this helps prevent inadvertent writes to the disk.
To do any serious work with a language it should be possible to develop source as a text file and load it into the system. Preferably, a way to store the updated information would also be available. The 'write_exec' word of this implementation provides a unique way to do the latter; once the source is loaded and compiled by the Forth kernel a new executable can be written to disk, which includes the newly compiled code. To 'seal' the code it is only necessary to tell the start-up code to go to a word that will not exit nor abort. The former desire is met by the word 'INCLUDE <file name>'. As an example GEN.4TH is included in this package. At the Forth blinking cursor type INCLUDE GEN.4TH <enter>. GEN.4TH looks like a normal text file and may be opened with any text processor. The 'write_exec' word is contained within this file so that the new executable is generated, as well. [INCLUDE cannot currently use nested disk access, i.e. the first INCLUDEd file cannot have an INCLUDE in its script.]
If you have any questions about MikeOS, or you're developing a similar OS and want to share code and ideas, go to the MikeOS website and join the mailing list as described.
MikeOS is open source and released under a BSD-like license (see doc/LICENSE.TXT in the MikeOS .zip file). Essentially, it means you can do anything you like with the code, including basing your own project on it, providing you retain the license file and give credit to the MikeOS developers for their work.