I implemented a Simpletron simulator in C for learning C.
Simpletron is a virtual machine invented by Deitel for his books. Simpletron runs programs written in the Simpletron Machine Language, an simple Machine Language.
An instruction (and data) in the Simpletron Machine Language is a signed four-digit integer, like +1009
. The first two digits are the opcode, and the last two digits are the operand.
I wrote a manual for the Simpletron Simulator in troff, it contains the instructions of the Simpletron Machine Language and some example programs.
Here's the manual, read it using the command man(1):
simpletron(6) Games Manual simpletron(6)
NAME
simpletron - simulates a simpletron computer
SYNOPSIS
simpletron [-c] [-m memsize] file
DESCRIPTION
simpletron simulates a simpletron machine running a program con‐
tained in file and written in SML, the Simpletron Machine Language.
The options are as follows:
-c Do a computer dump at the end of the simulation. A core dump
prints the name and contents of each register as well as the
complete contents of memory.
-m memsize
Set the size of the memory of the Simpletron simulator. The
memory must be big enough to hold the instructions and the
data.
The input have the same format as instruction (see the section THE
SIMPLETRON MACHINE LANGUAGE for information on the instruction syn‐
tax).
THE SIMPLETRON SIMULATOR
For information on how to implementate a Simpletron simulator, read
the README file provided with the code.
The memory
All information in the Simpletron is handled in terms of words. A
word is a signed four-digit decimal number such as +3364, -1293,
+0007, -0001, and so on.
The Simpletron is equipped with a 100-word memory by default (but it
can be expanded with the -m option). Each word in the memory is
referenced by their two-digit location numbers 00, 01, ..., 99. The
location 00 is the location of the first word, 01 is the location of
the second word, and so on.
Before running an SML program, the Simpletron Simulator loads the
programinto memory. The first instruction of every program is al‐
ways placed in location 00. Each location in the Simpletron's mem‐
ory may contain either an instruction, a data value used by a pro‐
gram, or an unused (and hence undefined) area of memory.
The registers
The Simpletron has a single "general purpose" register known as the
accumulator. Information must be put on the accumulator before the
Simpletron uses that information in calculations or examines it in
various ways.
The Simpletron also has "special purpose" registers used to manage
the instruction execution cycle. These registers cannot be changed
directly.
counter
The instruction counter keep track of the locationin memory
that contains the instruction being performed.
instruction register
The instruction register is a word containing the instruction
currently being performed.
opcode The opcode indicates the operation currently being performed.
It is the leftmost two digits of the instruction currently
being performed.
operand
The operand indicates the memory location or the immediate
value on which the current instruction operates. It is the
rightmost two digits of the instruction currently being per‐
formed.
The instruction execution cycle
After the SML program has been loaded into the memory, the Sim‐
pletron simulator executes it. It begins with the instruction in
location 00 and continues sequentially, unless directed to some
other part of the program by a transfer of control.
The instruction execution cycle do as the following.
The instruction counter tell the location of the next in‐
struction to be performed.
The contents of that location is fetched from memory into the
instruction register.
The operation code and the operand are extracted from the in‐
struction register.
The simpletron determines the operation to be executed.
At this point, the simulation of a instruction is completed.
All that remains is to prepare the Simpletron to execute the
next instruction. So the Simpletron ajust the instruction
counter accordingly.
THE SIMPLETRON MACHINE LANGUAGE
Each instruction written in the Simpletron Machine Language (SML)
occupies one word of the Simpletron's memory, so instructions are
signed four-digit decimal numbers. We assume that the sign of an
SML instruction is always plus, but the sign of a data word may be
either plus or minus. An instruction is a plus-signed 4-digit word
composed of two parts: the 2-digit operation code (aka "opcode") and
the 2-digit operand.
The first two digits of each SML instruction are the operation code,
which specifies the operation to be performed. SML operation codes
are summarized in the following sections between parentheses.
The last two digits of an SML instruction are the operand, which is
either the address of the memory location containing the word to
which the operation indirectly applies, or a value to which the op‐
eration directly applies.
In a SML file, each line is a instruction, a instruction begins with
a plus or minus sign followed by four decimal digits. The remaining
of the line is ignored.
Input/output operations
READ (10)
Read a word from the terminal into a specific location in
memory.
WRITE (11)
Write a word from a specific location in memory to the termi‐
nal.
Memory loading/storing
LOAD (20)
Loada word from a specific location in memory into the accu‐
mulator.
STORE (21)
Store a word from the accumulator into a specific location in
memory.
Memory arithmetic operations
Note that all the results are left in accumulator.
ADD (30)
Add a word from a specific location in memory to the word in
the accumulator.
SUBTRACT (31)
Subtract a word from a specific location in memory from the
word in the accumulator.
DIVIDE (32)
Divide a word from a specific location in memory into the
word in the accumulator.
MULTIPLY (33)
Multiply a word from a specific location in memory by the
word in the accumulator.
Immediate arithmetic operations
Note that all the results are left in accumulator.
ADD_I (40)
Add a the value in operand to the word in the accumulator.
SUBTRACT_I (41)
Subtract the value in operand from the word in the accumula‐
tor.
DIVIDE_I (42)
Divide the value in operand into the word in the accumulator.
MULTIPLY_I (43)
Multiply the value in operand by the word in the accumulator.
Transfer-of-control operations
BRANCH (50)
Branch to a specific location in memory.
BRANCHNEG (51)
Branch to a specific location in memory if the accumulator is
negative.
BRANCHZERO (52)
Branch to a specific location in memory if the accumulator is
zero.
HALT (53)
Halt (i'e', the program has completed its task).
EXAMPLES
The following are example of programs in the Simpletron Machine Lan‐
guage (SML).
adder.sml
The following SML program reads two numbers from the keyboard and
computes and prints their sum.
+1007 READ A
+1008 READ B
+2007 LOAD A
+3008 ADD B
+2109 STORE C
+1109 WRITE C
+5300 HALT
+0000 A
+0000 B
+0000 C
(1) The instruction +1007 reads the first number from the keyboard
and places it into location 07 (which has been initialized to zero).
(2) Then +1008 reads the next number into location 08.
(3) The load instruction (+2007) puts the first number into the ac‐
cumulator.
(4) The add instruction (+3008) adds the second number to the number
in theaccumulator. All SML aritmetic instructions leave their re‐
sults in the accumulator.
(5) The store instruction (+2109) placesthe result back into memory
location 09.
(6) From the location 09, the write instruction (+1109) takes the
number and prints it (as a signed four-digit decimal number).
(7) The halt instruction (+4300) terminates the execution.
larger.sml
The following SML program reads two numbers from the keyboard, and
determines and prints the larger value. Note the use of the in‐
struction +5107 as a conditional transfer of control, much the same
as C's if statement.
+1009 READ A
+1010 READ B
+2009 LOAD A
+3110 SUBTRACT B
+5107 BRANCHNEG 07
+1109 WRITE A
+5300 HALT
+1110 WRITE B
+5300 HALT
sum.sml
The following program uses a sentinel-controlled loop to read posi‐
tive integers and compute and printe their sum.
+1008 READ A
+2008 LOAD A
+5206 BRANCHZERO 06
+3009 SUM B
+2109 STORE B
+5000 BRANCH 00
+1109 WRITE B
+5300 HALT
+0000 A
+0000 B
average7.sml
The following program uses a counter-controlled loop to read seven
numbers, some positive and some negative, and compute and print
their average.
+2015 LOAD N
+5210 BRANCHZERO 10
+1016 READ A
+2016 LOAD A
+3017 ADD B
+2117 STORE B
+2015 LOAD N
+4101 SUB_I 01
+2115 STORE N
+5000 BRANCH 00
+2017 LOAD B
+4207 DIV_I 07
+2117 STORE B
+1117 WRITE B
+5300 HALT
+0007 N
+0000 A
+0000 B
EXIT STATUS
0 Success.
>0 Error occurred.
HISTORY
This version of simpletron, the Simpletron Simulator, is based on
the exercises 7.27~7.29 from the [Deitel & Deitel] book.
The immediate operations are unique to this implementation, since
the exercise does not mention them.
SEE ALSO
[Deitel & Deitel]
C: How to Program (8th edition), Paul Deitel and Harvey Dei‐
tel
simpletron(6)
And here is the Simpletron Simulator:
#include <err.h>
#include <errno.h>
#include <ctype.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#include <string.h>
#include <limits.h>
#include <unistd.h>
#define DEFMEMSIZE 100
#define MEM_MAX 9999
#define MEM_MIN -9999
#define INSTRUCTIONSIZE 4
#define OPSIZE 2
enum operation {
READ = 10,
WRITE = 11,
LOAD = 20,
STORE = 21,
ADD = 30,
SUBTRACT = 31,
DIVIDE = 32,
MULTIPLY = 33,
REMINDER = 34,
ADD_I = 40,
SUBTRACT_I = 41,
DIVIDE_I = 42,
MULTIPLY_I = 43,
REMINDER_I = 44,
BRANCH = 50,
BRANCHNEG = 51,
BRANCHZERO = 52,
HALT = 53
};
/* Simpletron's memory is simulated with a one-dimensional array */
static int *memory;
static int memsize = DEFMEMSIZE;
/* Simpletron's registers are simulated with the following variables */
static int acc; /* accumulator register (value being processed) */
static int ireg; /* instruction register (current instruction) */
static int simpletron(void);
static void load(FILE *);
static void dump(void);
static int getinstruction(FILE *, int *);
static int getmemsize(const char *s);
static void usage(void);
/* load a program in the Simpletron Machine Language into memory and execute it*/
int
main(int argc, char *argv[])
{
int c, exitval, coredump;
FILE *fp;
coredump = 0;
while ((c = getopt(argc, argv, "cm:")) != -1) {
switch (c) {
case 'm':
if ((memsize = getmemsize(optarg)) < 1)
errx(EXIT_FAILURE, "%s: improper memory size", optarg);
break;
case 'c':
coredump = 1;
break;
default:
usage();
break;
}
}
argc -= optind;
argv += optind;
if (argc != 1)
usage();
if ((memory = calloc(memsize, sizeof *memory)) == NULL)
err(EXIT_FAILURE, NULL);
if ((fp = fopen(*argv, "r")) == NULL)
err(EXIT_FAILURE, "%s", *argv);
load(fp); /* load program into memory */
exitval = simpletron(); /* execute instructions in memory */
if (coredump)
dump(); /* do coredump, if needed */
free(memory);
return exitval;
}
/* run instructions from memory; return 1 if error occurs, return 0 otherwise */
static int
simpletron(void)
{
static int count;
static int opcode;
static int operand;
/* memory location of next instruction */
/* simulation begins with the instruction in the location 00 and continues sequentially */
count = 0;
/* this loop implements the "instruction execution cycle" */
while (count < memsize) {
ireg = memory[count];
opcode = ireg / 100; /* opcode is the leftmost two digits of instruction register*/
operand = ireg % 100; /* operand is the rightmost two digits of instruction register*/
/* this switch statement determine the operation to be performed */
/* each case set the counter for next instruction accordingly */
switch (opcode) {
case READ:
if (getinstruction(stdin, &memory[operand]) == 0) {
warnx("improper input");
return 1;
}
count++;
break;
case WRITE:
printf("%+05d\n", memory[operand]);
count++;
break;
case LOAD:
acc = memory[operand];
count++;
break;
case STORE:
memory[operand] = acc;
count++;
break;
case ADD:
if ((memory[operand] > 0 && acc > MEM_MAX - memory[operand]) ||
(memory[operand] < 0 && acc < MEM_MIN - memory[operand])) {
warnx("integer overflow");
return 1;
}
else
acc += memory[operand];
count++;
break;
case SUBTRACT:
if ((memory[operand] > 0 && acc < MEM_MIN + memory[operand]) ||
(memory[operand] < 0 && acc > MEM_MAX + memory[operand])) {
warnx("integer overflow");
return 1;
}
else
acc -= memory[operand];
count++;
break;
case DIVIDE:
if (memory[operand] == 0) {
warnx("division by zero");
return 1;
} else if ((acc == MEM_MIN) && (memory[operand] == -1)) {
warnx("signed integer overflow");
return 1;
} else {
acc /= memory[operand];
}
count++;
break;
case MULTIPLY:
acc *= memory[operand];
if (acc < MEM_MIN || acc > MEM_MAX) {
warnx("integer overflow");
return 1;
}
count++;
break;
case REMINDER:
if (memory[operand] == 0) {
warnx("remainder by zero");
return 1;
} else if ((acc == MEM_MIN) && (memory[operand] == -1)) {
warnx("signed integer overflow");
return 1;
} else {
acc %= memory[operand];
}
count++;
break;
case ADD_I:
if ((operand > 0 && acc > MEM_MAX - operand) ||
(operand < 0 && acc < MEM_MIN - operand)) {
warnx("integer overflow");
return 1;
} else {
acc += operand;
}
count++;
break;
case SUBTRACT_I:
if ((operand > 0 && acc < MEM_MIN + operand) ||
(operand < 0 && acc > MEM_MAX + operand)) {
warnx("integer overflow");
return 1;
} else {
acc -= operand;
}
count++;
break;
case DIVIDE_I:
if (operand == 0) {
warnx("division by zero");
return 1;
} else if ((acc == MEM_MIN) && (operand == -1)) {
warnx("signed integer overflow");
return 1;
} else {
acc /= operand;
}
count++;
break;
case MULTIPLY_I:
acc *= operand;
if (acc < MEM_MIN || acc > MEM_MAX) {
warnx("integer overflow");
return 1;
}
count++;
break;
case REMINDER_I:
if (operand == 0) {
warnx("remainder by zero");
return 1;
} else if ((acc == MEM_MIN) && (operand == -1)){
warnx("signed integer overflow");
return 1;
} else {
acc %= operand;
}
count++;
break;
case BRANCH:
count = operand;
break;
case BRANCHNEG:
if (acc < 0)
count = operand;
else
count++;
break;
case BRANCHZERO:
if (acc == 0)
count = operand;
else
count++;
break;
case HALT:
return 0;
default:
warnx("%+05d: invalid instruction", ireg);
return 1;
}
}
warnx("execution reached end of memory without halting");
return 1;
}
/* load memory from file */
static void
load(FILE *fp)
{
size_t i;
int instruction;
i = 0;
while(getinstruction(fp, &instruction) && i < memsize)
memory[i++] = instruction;
}
/* write a core dump of memory and registers into stdout */
static void
dump(void)
{
size_t i, j;
fprintf(stderr, "\nREGISTERS:\n"
"accumulator %+05d\n"
"instruction register %+05d\n",
acc, ireg);
fprintf(stderr, "\nMEMORY:\n"
" 0 1 2 3 4 5 6 7 8 9\n");
for (i = 0; i < memsize / 10; i++) {
fprintf(stderr, "%2lu ", i * 10);
for (j = 0; j < memsize / 10; j++)
fprintf(stderr, "%+05d%s", memory[(i*10)+j],
(j == memsize / 10 - 1) ? "\n" : " ");
}
}
/* get instruction from fp; return 0 if instruction is improper */
static int
getinstruction(FILE *fp, int *instruction)
{
size_t i;
int c, num, sign;
num = 0;
/* get initial blank */
while (isblank(c = getc(fp)))
;
/* get instruction/data sign */
sign = (c == '-') ? -1 : 1;
if (c != '+' && c != '-')
return 0;
else
c = getc(fp);
/* get instruction/data number */
for (i = 0; i < INSTRUCTIONSIZE; i++) {
if (!isdigit(c))
return 0;
num = num * 10 + c - '0';
c = getc(fp);
}
/* get remaining of command line */
while (c != '\n' && c != EOF)
c = getc(fp);
*instruction = sign * num;
return 1;
}
/* get an integer from s to be used as the memory size */
static int
getmemsize(const char *s)
{
long n;
char *endp;
n = strtol(s, &endp, 10);
if (errno == ERANGE || n > INT_MAX || n < INT_MIN || endp == s || *endp != '0円')
return -1;
return (int) n;
}
static void
usage(void)
{
(void) fprintf(stderr, "usage: simpletron [-c] [-m memsize] file\n");
exit(EXIT_FAILURE);
}
Here is a sample program in the Simpletron Machine Language, average7.sml, it receives 7 values from input and calculates the average between them.
+1008
+2008
+5206
+3009
+2109
+5000
+1109
+5300
+0000
+0000
The input of a Simpletron program must be a signed four-digit integer, like +0007
or -0001
.
Is there any way I can make the code more elegant and portable?
2 Answers 2
In addition to the review you already have, I have a few more suggestions.
Fix the bug
As was already pointed out, the assignment-in-condition practice is problematic. In this particular case, the problem is in main
. The current code has this:
if ((memory = calloc(memsize, sizeof *memory)) == NULL)
err(EXIT_FAILURE, NULL);
if ((fp = fopen(*argv, "r")) == NULL)
err(EXIT_FAILURE, "%s", *argv);
The problem is that if the file doesn't exist, the memory just allocated will not be freed. For that reason and for the fact that it's generally better to define variables when they are declared, I'd write that sequence like this instead:
FILE *fp = fopen(*argv, "r");
if (fp == NULL) {
free(memory);
err(EXIT_FAILURE, "%s", *argv);
}
Think carefully about signed vs. unsigned numbers
What would it mean for the memsize
to be a negative number? I can't think of a rational interpretation for such a thing, so I'd recommend making that size_t
which is unsigned.
Eliminate global variables
In this case there lot of global variables such as memory
and memsize
which are probably better gathered up into a structure and made part of main
instead of global. Then for each of the relevant functions such as load
or dump
, pass a pointer to the structure as one of the arguments.
struct Simpletron {
/* Simpletron's memory is simulated with a one-dimensional array */
int *memory;
size_t memsize;
/* Simpletron's registers are simulated with the following variables */
int acc; /* accumulator register (value being processed) */
size_t pc; /* program counter points to current instruction */
int opcode; /* current opcode */
int operand; /* current operand */
};
Note that I've also changed from ireg
to pc
. More on that later.
Make the program data driven
Instead of the operation
enum
, a large switch
statement, etc. I think it would be much neater to have a struct
for instructions. Here's how I'd define it:
struct Instruction {
int opcode;
const char *mnemonic;
const char *printstr;
int (*exec)(struct Simpletron* s);
};
Now we can create an array of instructions. Here's an example of one:
{ 52,"BRANCHZERO"," %2u", simple_instr_branchzero },
Now all that remains is to write the code that performs the instruction:
static int simple_instr_branchzero(struct Simpletron *s) {
if (s->acc == 0) {
s->pc = s->operand;
} else {
++s->pc;
}
return WARN_NONE;
}
Make error messages and numbers neater
You may have noticed that the function above returns WARN_NONE
. This is somewhat easier for a programmer to read and understand than something like return 0
and also has the advantage that we now have both a code and a message (which might be translated to other languages, for instance). So instead of this inside the large switch
:
case ADD_I:
if ((operand > 0 && acc > MEM_MAX - operand) ||
(operand < 0 && acc < MEM_MIN - operand)) {
warnx("integer overflow");
return 1;
} else {
acc += operand;
}
count++;
break;
We can write this:
static int simple_instr_add_i(struct Simpletron *s) {
int result = s->acc + s->operand;
if (isOutOfRange(result)) {
return WARN_OVERFLOW;
}
s->acc = result;
++s->pc;
return WARN_NONE;
}
This is enabled using this code:
enum warning { WARN_NONE, WARN_HALT, WARN_INPUT, WARN_OVERFLOW, WARN_DIVZERO, WARN_SIGNEDOVERFLOW, WARN_REMAINZERO, WARN_COUNT };
static const struct Error {
enum warning value;
const char *text;
} simpletron_errors[WARN_COUNT] = {
{ WARN_NONE, "ok" },
{ WARN_HALT, "halt" },
{ WARN_INPUT, "improper input" },
{ WARN_OVERFLOW, "integer overflow" },
{ WARN_DIVZERO, "division by zero" },
{ WARN_SIGNEDOVERFLOW, "signed integer overflow"},
{ WARN_REMAINZERO, "remainder by zero"},
};
Note that WARN_COUNT
is not a real warning, but rather a marker to define the size of the array and also for us with error checking on access to that array.
Use helper functions to clarify code
The code above uses isOutOfRange
which simplifies the code and makes it clear to the reader. The content is this:
static bool isOutOfRange(int n) {
return n < MEM_MIN || n > MEM_MAX;
}
Use action words for functions
The functions load
and dump
are named in a way that suggests their function, but I think simpletron
is not as good. Since they are all dealing with the same underlying machine, I'd suggest naming them as simpletron_load
, simpletron_dump
and simpletron_run
.
Separate interface from implementation
I'd suggest splitting the program into three pieces: main.c
which would contain main
and functions only needed by it, a simpletron.h
file that defines the interface to the virtual machine and simpletron.c
which would contain the implementation. Here's how I would define simpletron.h
:
#ifndef SIMPLETRON_H
#define SIMPLETRON_H
#include <stdio.h>
#include <stdbool.h>
struct Simpletron {
/* Simpletron's memory is simulated with a one-dimensional array */
int *memory;
size_t memsize;
/* Simpletron's registers are simulated with the following variables */
int acc; /* accumulator register (value being processed) */
size_t pc; /* program counter points to current instruction */
int opcode; /* current opcode */
int operand; /* current operand */
};
int simpletron_run(struct Simpletron *s, bool trace, bool verbose);
int simpletron_load(struct Simpletron *s, FILE *fp);
void simpletron_dump(struct Simpletron *s);
#endif // SIMPLETRON_H
Only the minimal information to use the interface is here. All of the other details are encapsulated in simpletron.c
.
Prefer const
to #define
Since C99, it's generally better to use const
rather than #define
for numerical constants. For instance, I'd put these inside simpletron.c
:
static const int MEM_MAX = 9999;
static const int MEM_MIN = -9999;
static const int INSTRUCTIONSIZE = 4;
This way, we get the benefit of type checking and also limiting scope.
Consider adding features
I thought it would be nice to be able to trace the program and also, optionally, to dump the contents of the machine after each instruction. This heavily modified version of your original simpletron
function does just that.
/* run instructions from memory; return 1 if error occurs, return 0 otherwise */
int simpletron_run(struct Simpletron *s, bool trace, bool verbose) {
/* memory location of next instruction */
/* simulation begins with the instruction in the location 00 and continues sequentially */
s->pc = 0;
/* this loop implements the "instruction execution cycle" */
while (s->pc < s->memsize) {
/* opcode is the leftmost two digits of instruction register*/
s->opcode = s->memory[s->pc] / 100;
/* operand is the rightmost two digits of instruction register*/
s->operand = s->memory[s->pc] % 100;
/* simple linear scan for opcode */
const struct Instruction *op = findop(s->opcode);
if (op == NULL) {
warnx("%+05d: invalid instruction", s->memory[s->pc]);
return 1;
}
if (trace) {
fprintf(stderr, "%05lu: %+05d\t", s->pc, s->memory[s->pc]);
fprintf(stderr, op->mnemonic);
fprintf(stderr, op->printstr, s->operand);
fprintf(stderr, "\n");
}
int result = op->exec(s);
if (verbose) {
simpletron_dump(s);
}
if (result == WARN_HALT) {
return 0;
}
if (result != WARN_NONE && result < WARN_COUNT) {
warnx(simpletron_errors[result].text);
return 1;
}
}
warnx("execution reached end of memory without halting");
return 1;
}
Using these features was a simple matter of adding the appropriate arguments for main
and passing two boolean values. Much of this functionality is enabled by the use of the data-driven design, but there's still more.
Fully use data structures to simplify features
The posted example code purports to take an average of seven numbers, but it does no such thing. In fact, it computes a sum of a list of numbers terminated by a sentinel value of zero. A program that computes an average might look like this in source code form:
READ [13] ; read a number from the uset
LOAD [13] ; acc = number
ADD [15] ; add to running sum
STORE [15] ; store sum
LOAD [14] ; fetch counter
ADD_I 1 ; increment by one
STORE [14] ; save updated count
BRANCHNEG 0 ; if <0, we're not done yet
LOAD [15] ; fetch the running sum
DIVIDE_I 7 ; divide by seven
STORE [13] ; store the updated value
WRITE [13] ; write it to stdout
HALT
+0000 ; this is location 13 used as a scratchpad for input
-0007 ; this is the value -n (number of numbers to avg)
+0000 ; this is location 15 that holds the running sum
It was certainly not obvious from a raw list of numbers what the original code actually did until I added the trace function mentioned above. It's a relatively simple task to allow the code to accept either this nice source code version or the original raw number version. Here's an enhanced simpletron_load
function that does just that:
int simpletron_load(struct Simpletron *s, FILE *fp) {
unsigned linenum = 1;
char inst[13];
inst[12] = '0円'; // assure it's terminated
size_t i = 0;
const char* error = NULL;
while (!error && (fscanf(fp, "%12s", inst) == 1)) {
// is it a number
if (inst[0] == '+' || inst[0] == '-') {
int arg;
if (sscanf(inst, "%5d", &arg) == 1) {
s->memory[i++] = arg;
} else {
error = "reading number";
}
} else {
const struct Instruction *in = findmnemonic(inst);
if (in) {
if (strlen(in->printstr)) {
int arg = parsearg(in->printstr, fp);
if (arg >= 0) {
s->memory[i++] = in->opcode*100 + arg;
} else {
error = "reading instruction";
}
} else {
s->memory[i++] = in->opcode*100;
}
} else {
error = "reading instruction";
}
}
// ignore the rest of the line
for (int c = getc(fp); c != '\n' && c != EOF; c = getc(fp)) {
}
++linenum;
if (i >= s->memsize) {
error = "memory full";
}
}
if (error) {
printf("ERROR:%s:%d\n", error, linenum);
return 1;
}
return 0;
}
Make the machine do the counting
Rather than fiddling with tedious parsing of formatted output, I generally prefer to let the machine do the counting for me. To that end, the rewritten simpletron_dump
function now looks like this:
/* write a core dump of memory and registers into stdout */
void simpletron_dump(struct Simpletron *s) {
const unsigned linelen = 10;
fprintf(stderr, "\nREGISTERS:\n"
"accumulator %+05d\n"
"instruction pointer +%04lu\n"
"\nMEMORY:\n ",
s->acc, s->pc);
for (unsigned i = 0; i < linelen; ++i) {
fprintf(stderr, "%7u", i);
}
unsigned dumpcount = 0;
for (size_t i = 0; i < s->memsize; ++i, --dumpcount) {
if (dumpcount == 0) {
fprintf(stderr, "\n%2lu ", i );
dumpcount = linelen;
}
fprintf(stderr, "%+05d ", s->memory[i]);
}
fprintf(stderr, "\n");
}
The code uses the const unsigned linelen
to keep track of how many values to print out per line for both the header and for the memory dump. That also fixes another bug in the original which did not print memory correctly.
Understand real CPUs
I know this is all a learning exercise, but one thing that may be useful is to understand a bit more about real CPU architecture. For example, rather than throwing an error in the event of overflow, real CPUs typically have a carry flag which indicates this and an overflow flag to indicate signed overflow. Also, it is more typical to have an instruction pointer (sometimes called a program counter) rather than an instruction register that actually holds the current instruction. Of course internal to the machine, something eventually does fetch and parse the value of the instruction, but it's quite rare for that to be directly accessible from the outside. This was one reason I changed from ireg
to pc
as mentioned above. It keeps things neater and more clearly mimics how real machines work.
Results
Here's the revised version of simpletron.c
:
#include "simpletron.h"
#include <err.h>
#include <errno.h>
#include <ctype.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#include <string.h>
#include <limits.h>
#include <unistd.h>
static const int MEM_MAX = 9999;
static const int MEM_MIN = -9999;
static const int INSTRUCTIONSIZE = 4;
enum warning { WARN_NONE, WARN_HALT, WARN_INPUT, WARN_OVERFLOW, WARN_DIVZERO, WARN_SIGNEDOVERFLOW, WARN_REMAINZERO, WARN_COUNT };
static const struct Error {
enum warning value;
const char *text;
} simpletron_errors[WARN_COUNT] = {
{ WARN_NONE, "ok" },
{ WARN_HALT, "halt" },
{ WARN_INPUT, "improper input" },
{ WARN_OVERFLOW, "integer overflow" },
{ WARN_DIVZERO, "division by zero" },
{ WARN_SIGNEDOVERFLOW, "signed integer overflow"},
{ WARN_REMAINZERO, "remainder by zero"},
};
static bool isOutOfRange(int n) {
return n < MEM_MIN || n > MEM_MAX;
}
/* get instruction from fp; return 0 if instruction is improper */
static int fetch_number(FILE *fp, int *instruction) {
int num = 0;
int c;
int sign = 1;
/* get initial blank */
while (isblank(c = getc(fp)))
;
/* get instruction/data sign */
switch (c) {
case '-':
sign = -1;
// fall through
case '+':
c = getc(fp);
break;
default: // error condition
return 0;
}
/* get instruction/data number */
for (int i = INSTRUCTIONSIZE; i; --i) {
if (!isdigit(c)) { // error
return 0;
}
num = num * 10 + c - '0';
c = getc(fp);
}
/* get remaining of command line */
while (c != '\n' && c != EOF) {
c = getc(fp);
}
*instruction = sign * num;
return 1;
}
static int simple_instr_read(struct Simpletron *s) {
if (fetch_number(stdin, &s->memory[s->operand]) == 0) {
return WARN_INPUT;
}
++s->pc;
return WARN_NONE;
}
static int simple_instr_write(struct Simpletron *s) {
printf("%+05d\n", s->memory[s->operand]);
++s->pc;
return WARN_NONE;
}
static int simple_instr_load(struct Simpletron *s) {
s->acc = s->memory[s->operand];
++s->pc;
return WARN_NONE;
}
static int simple_instr_store(struct Simpletron *s) {
s->memory[s->operand] = s->acc;
++s->pc;
return WARN_NONE;
}
static int simple_instr_add(struct Simpletron *s) {
int result = s->acc + s->memory[s->operand];
if (isOutOfRange(result)) {
return WARN_OVERFLOW;
}
s->acc = result;
++s->pc;
return WARN_NONE;
}
static int simple_instr_subtract(struct Simpletron *s) {
int result = s->acc - s->memory[s->operand];
if (isOutOfRange(result)) {
return WARN_OVERFLOW;
}
s->acc = result;
++s->pc;
return WARN_NONE;
}
static int simple_instr_divide(struct Simpletron *s) {
if (s->memory[s->operand] == 0) {
return WARN_DIVZERO;
} else if ((s->acc == MEM_MIN) && (s->memory[s->operand] == -1)) {
return WARN_SIGNEDOVERFLOW;
} else {
s->acc /= s->memory[s->operand];
}
++s->pc;
return WARN_NONE;
}
static int simple_instr_multiply(struct Simpletron *s) {
s->acc *= s->memory[s->operand];
if (isOutOfRange(s->acc)) {
return WARN_OVERFLOW;
}
++s->pc;
return WARN_NONE;
}
static int simple_instr_remainder(struct Simpletron *s) {
if (s->memory[s->operand] == 0) {
return WARN_REMAINZERO;
} else if ((s->acc == MEM_MIN) && (s->memory[s->operand] == -1)) {
return WARN_SIGNEDOVERFLOW;
} else {
s->acc %= s->memory[s->operand];
}
++s->pc;
return WARN_NONE;
}
static int simple_instr_add_i(struct Simpletron *s) {
int result = s->acc + s->operand;
if (isOutOfRange(result)) {
return WARN_OVERFLOW;
}
s->acc = result;
++s->pc;
return WARN_NONE;
}
static int simple_instr_subtract_i(struct Simpletron *s) {
int result = s->acc - s->operand;
if (isOutOfRange(result)) {
return WARN_OVERFLOW;
}
s->acc = result;
++s->pc;
return WARN_NONE;
}
static int simple_instr_divide_i(struct Simpletron *s) {
if (s->operand == 0) {
return WARN_DIVZERO;
} else if ((s->acc == MEM_MIN) && (s->operand == -1)) {
return WARN_SIGNEDOVERFLOW;
} else {
s->acc /= s->operand;
}
++s->pc;
return WARN_NONE;
}
static int simple_instr_multiply_i(struct Simpletron *s) {
s->acc *= s->operand;
if (isOutOfRange(s->acc)) {
return WARN_OVERFLOW;
}
++s->pc;
return WARN_NONE;
}
static int simple_instr_remainder_i(struct Simpletron *s) {
if (s->operand == 0) {
return WARN_REMAINZERO;
} else if ((s->acc == MEM_MIN) && (s->operand == -1)){
return WARN_SIGNEDOVERFLOW;
} else {
s->acc %= s->operand;
}
++s->pc;
return WARN_NONE;
}
static int simple_instr_branch(struct Simpletron *s) {
s->pc = s->operand;
return WARN_NONE;
}
static int simple_instr_branchneg(struct Simpletron *s) {
if (s->acc < 0) {
s->pc = s->operand;
} else {
++s->pc;
}
return WARN_NONE;
}
static int simple_instr_branchzero(struct Simpletron *s) {
if (s->acc == 0) {
s->pc = s->operand;
} else {
++s->pc;
}
return WARN_NONE;
}
static int simple_instr_halt(struct Simpletron *s) {
s=s;
return WARN_HALT;
}
static const struct Instruction {
int opcode;
const char *mnemonic;
const char *printstr;
int (*exec)(struct Simpletron* s);
} instructions[] = {
{ 10,"READ"," [%2u]", simple_instr_read },
{ 11,"WRITE"," [%2u]", simple_instr_write },
{ 20,"LOAD"," [%2u]", simple_instr_load },
{ 21,"STORE"," [%2u]", simple_instr_store },
{ 30,"ADD"," [%2u]", simple_instr_add },
{ 31,"SUBTRACT"," [%2u]", simple_instr_subtract },
{ 32,"DIVIDE"," [%2u]", simple_instr_divide },
{ 33,"MULTIPLY"," [%2u]", simple_instr_multiply },
{ 34,"REMAINDER"," [%2u]", simple_instr_remainder },
{ 40,"ADD_I"," %2u", simple_instr_add_i },
{ 41,"SUBTRACT_I"," %2u", simple_instr_subtract_i },
{ 42,"DIVIDE_I"," %2u", simple_instr_divide_i },
{ 43,"MULTIPLY_I"," %2u", simple_instr_multiply_i },
{ 44,"REMAINDER_I"," %2u", simple_instr_remainder_i },
{ 50,"BRANCH"," %2u", simple_instr_branch },
{ 51,"BRANCHNEG"," %2u", simple_instr_branchneg },
{ 52,"BRANCHZERO"," %2u", simple_instr_branchzero },
{ 53,"HALT","" , simple_instr_halt },
};
static const struct Instruction *findop(int opcode) {
for (size_t i=0; i < sizeof(instructions)/sizeof(instructions[0]); ++i) {
if (opcode == instructions[i].opcode) {
return &instructions[i];
}
}
return NULL;
}
static const struct Instruction *findmnemonic(const char *mnemonic) {
for (size_t i=0; i < sizeof(instructions)/sizeof(instructions[0]); ++i) {
if (strcmp(mnemonic, instructions[i].mnemonic) == 0) {
return &instructions[i];
}
}
return NULL;
}
/* run instructions from memory; return 1 if error occurs, return 0 otherwise */
int
simpletron_run(struct Simpletron *s, bool trace, bool verbose)
{
/* memory location of next instruction */
/* simulation begins with the instruction in the location 00 and continues sequentially */
s->pc = 0;
/* this loop implements the "instruction execution cycle" */
while (s->pc < s->memsize) {
/* opcode is the leftmost two digits of instruction register*/
s->opcode = s->memory[s->pc] / 100;
/* operand is the rightmost two digits of instruction register*/
s->operand = s->memory[s->pc] % 100;
/* simple linear scan for opcode */
const struct Instruction *op = findop(s->opcode);
if (op == NULL) {
warnx("%+05d: invalid instruction", s->memory[s->pc]);
return 1;
}
if (trace) {
fprintf(stderr, "%05lu: %+05d\t", s->pc, s->memory[s->pc]);
fprintf(stderr, op->mnemonic);
fprintf(stderr, op->printstr, s->operand);
fprintf(stderr, "\n");
}
int result = op->exec(s);
if (verbose) {
simpletron_dump(s);
}
if (result == WARN_HALT) {
return 0;
}
if (result != WARN_NONE && result < WARN_COUNT) {
warnx(simpletron_errors[result].text);
return 1;
}
}
warnx("execution reached end of memory without halting");
return 1;
}
static int parsearg(const char *fmt, FILE *fp) {
unsigned arg = 0;
int result = fscanf(fp, fmt, &arg);
return (result == 1) ? (int)arg : -1;
}
int simpletron_load(struct Simpletron *s, FILE *fp) {
unsigned linenum = 1;
char inst[13];
inst[12] = '0円'; // assure it's terminated
size_t i = 0;
const char* error = NULL;
while (!error && (fscanf(fp, "%12s", inst) == 1)) {
// is it a number
if (inst[0] == '+' || inst[0] == '-') {
int arg;
if (sscanf(inst, "%5d", &arg) == 1) {
s->memory[i++] = arg;
} else {
error = "reading number";
}
} else {
const struct Instruction *in = findmnemonic(inst);
if (in) {
if (strlen(in->printstr)) {
int arg = parsearg(in->printstr, fp);
if (arg >= 0) {
s->memory[i++] = in->opcode*100 + arg;
} else {
error = "reading instruction";
}
} else {
s->memory[i++] = in->opcode*100;
}
} else {
error = "reading instruction";
}
}
// ignore the rest of the line
for (int c = getc(fp); c != '\n' && c != EOF; c = getc(fp)) {
}
++linenum;
if (i >= s->memsize) {
error = "memory full";
}
}
if (error) {
printf("ERROR:%s:%d\n", error, linenum);
return 1;
}
return 0;
}
/* write a core dump of memory and registers into stdout */
void simpletron_dump(struct Simpletron *s) {
fprintf(stderr, "\nREGISTERS:\n"
"accumulator %+05d\n"
"instruction pointer +%04lu\n",
s->acc, s->pc);
fprintf(stderr, "\nMEMORY:\n ");
const unsigned linelen = 10;
for (unsigned i = 0; i < linelen; ++i) {
fprintf(stderr, "%7u", i);
}
unsigned dumpcount = 0;
for (size_t i = 0; i < s->memsize; ++i, --dumpcount) {
if (dumpcount == 0) {
fprintf(stderr, "\n%2lu ", i );
dumpcount = linelen;
}
fprintf(stderr, "%+05d ", s->memory[i]);
}
fprintf(stderr, "\n");
}
```
-
\$\begingroup\$ I am in fact too much adapted with old idioms I learned from K&R and manpages. Do you recommend any book I can read to make me adapt to the modern programming practices and idioms? \$\endgroup\$phillbush– phillbush2020年03月26日 22:25:39 +00:00Commented Mar 26, 2020 at 22:25
-
\$\begingroup\$ I would recommend "Modern C" by Jens Gustedt. It is a very recent book published by Manning and looks like an excellent resource for learning the modern version of the language. \$\endgroup\$Edward– Edward2020年03月27日 11:50:04 +00:00Commented Mar 27, 2020 at 11:50
-
\$\begingroup\$ define variables when they are defined - you mean initialize when defined? \$\endgroup\$Reinderien– Reinderien2020年03月27日 14:12:03 +00:00Commented Mar 27, 2020 at 14:12
-
\$\begingroup\$ Yes, I meant define variables when they are declared. Corrected now, thanks! \$\endgroup\$Edward– Edward2020年03月27日 14:14:13 +00:00Commented Mar 27, 2020 at 14:14
-
\$\begingroup\$ For that conversion, I opened the file in two windows and cut and pasted the body of each
case
statement into each separate function. Then just do minimal modifications for converting to use thestruct Simpletron
and you're done. Tedious, perhaps, but not hard. Let me know if I can answer any questions or help. \$\endgroup\$Edward– Edward2020年03月27日 20:49:02 +00:00Commented Mar 27, 2020 at 20:49
Re-entrance
These:
static int *memory;
static int acc; /* accumulator register (value being processed) */
static int ireg; /* instruction register (current instruction) */
// ...
static int count;
static int opcode;
static int operand;
force a user to start a new program if they want a new instance of the calculator. If you want to offer an API that allows the co-existence of multiple calculators, pass around a struct instead.
C99
These:
int c, exitval, coredump;
FILE *fp;
haven't needed declaration at the beginning of the function for 20-ish years. It's more legible for them to be declared and initialized closer to where they're actually being used in the function.
Assignment-in-condition
About these various statements -
while ((c = getopt(argc, argv, "cm:")) != -1) {
if ((memory = calloc(memsize, sizeof *memory)) == NULL)
if ((fp = fopen(*argv, "r")) == NULL)
while (isblank(c = getc(fp)))
Don't, please. Expand this out so that the variable is assigned in its own statement. The above is confusing and error-prone, and has no performance gains. The only thing it's good for is code golf, which you aren't currently playing.
Addition efficiency
if ((memory[operand] > 0 && acc > MEM_MAX - memory[operand]) ||
(memory[operand] < 0 && acc < MEM_MIN - memory[operand])) {
warnx("integer overflow");
return 1;
}
else
acc += memory[operand];
can become something like
int sum = memory[operand] + acc;
if (sum > MEM_MAX || sum < MEM_MIN) {
warnx("integer overflow");
return 1;
}
acc = sum;
In other words: don't do the addition three times; do it once. The same applies to SUBTRACT
.
Order of operations
((acc == MEM_MIN) && (memory[operand] == -1))
doesn't require inner parens, due to operator precedence.
Typo
REMINDER
should be REMAINDER
.
Loop sanity
size_t i;
i = 0;
while(getinstruction(fp, &instruction) && i < memsize)
memory[i++] = instruction;
is better represented by
for (size_t i = 0; i < memsize; i++) {
if (!getinstruction(fp, &instruction))
break;
memory[i] = instruction;
}
Memory efficiency
Currently you're storing integers in 32 bits that, since they have values of less than 10,000, could fit in 16. Depending on your constraints - whether you're optimizing for execution speed or memory efficiency - you might want to change this. 16 bits might actually be slower on your architecture, but to be sure you'd want to profile. Also, if you ever plan on serializing the state of the machine into a file, you should use 16 bits (int16_t
from stdint.h
).
-
\$\begingroup\$ For the addition, I was using SEI's CERT INT32-C to ensure integer overflow. The assignment-in-condition is a hint that survived from when I read K&R book, they use assignment-in-condition all the time. Declaration at the beginning of function was also from K&R. I am reading Deitel's C:How to Program after have read K&R, is this a good path choice? I think Deitel's codes are much less elegant than K&R's... \$\endgroup\$phillbush– phillbush2020年03月26日 09:12:27 +00:00Commented Mar 26, 2020 at 9:12
-
\$\begingroup\$ Also, the assignments-in-condition (particularly the one for getopt(3)) are idioms I've learnt from OpenBSD's manpages. I think assignment-in-conditions are not out of fashion in operating system development cycles. \$\endgroup\$phillbush– phillbush2020年03月26日 09:16:18 +00:00Commented Mar 26, 2020 at 9:16
-
\$\begingroup\$ Re. K&R - the most recent (2nd) edition is from 1988. C99 should be used by everyone at this point. The classics are classic, but they need to be taken with a grain of salt and some understanding of the modern world. \$\endgroup\$Reinderien– Reinderien2020年03月26日 14:38:56 +00:00Commented Mar 26, 2020 at 14:38
-
1\$\begingroup\$ I was using SEI's CERT INT32-C to ensure integer overflow - it's excellent for you to be thinking about those details, but I fear that the situation is more complicated. Modern optimizing compilers are able to move terms across the sides of an inequality, for instance. That aside, since you're currently using a 32-bit representation, the data range is significantly narrower than the register range, so you do not have to worry about these details. Even if you switched to 16-bit, you would still be safe, because 2*10^4 < 2^15. \$\endgroup\$Reinderien– Reinderien2020年03月26日 14:48:49 +00:00Commented Mar 26, 2020 at 14:48
-
1\$\begingroup\$ Also, it does not surprise me that man pages and operating system code both show signs of predating C99, since the BSD kernel certainly happened first, and it's entirely likely that it hasn't switched styles. \$\endgroup\$Reinderien– Reinderien2020年03月26日 14:53:24 +00:00Commented Mar 26, 2020 at 14:53
man
and pasted the formatted output here. It seems like an unnecessary hoop to jump through for reviewers otherwise. Unless the code of the man page is explicitly included in what you want reviewed, of course. \$\endgroup\$Assignment-in-condition
is addressing code readability and maintainability, and it section is correct, your code is not as readable as it could be and someone could definitely miss the assignment statements. \$\endgroup\$