8
\$\begingroup\$

I implemented a Simpletron simulator in C for learning C.
Simpletron is a virtual machine invented by Deitel for his books. Simpletron runs programs written in the Simpletron Machine Language, an simple Machine Language.

An instruction (and data) in the Simpletron Machine Language is a signed four-digit integer, like +1009. The first two digits are the opcode, and the last two digits are the operand.

I wrote a manual for the Simpletron Simulator in troff, it contains the instructions of the Simpletron Machine Language and some example programs.

Here's the manual, read it using the command man(1):

simpletron(6) Games Manual simpletron(6)
NAME
 simpletron - simulates a simpletron computer
SYNOPSIS
 simpletron [-c] [-m memsize] file
DESCRIPTION
 simpletron simulates a simpletron machine running a program con‐
 tained in file and written in SML, the Simpletron Machine Language.
 The options are as follows:
 -c Do a computer dump at the end of the simulation. A core dump
 prints the name and contents of each register as well as the
 complete contents of memory.
 -m memsize
 Set the size of the memory of the Simpletron simulator. The
 memory must be big enough to hold the instructions and the
 data.
 The input have the same format as instruction (see the section THE
 SIMPLETRON MACHINE LANGUAGE for information on the instruction syn‐
 tax).
THE SIMPLETRON SIMULATOR
 For information on how to implementate a Simpletron simulator, read
 the README file provided with the code.
 The memory
 All information in the Simpletron is handled in terms of words. A
 word is a signed four-digit decimal number such as +3364, -1293,
 +0007, -0001, and so on.
 The Simpletron is equipped with a 100-word memory by default (but it
 can be expanded with the -m option). Each word in the memory is
 referenced by their two-digit location numbers 00, 01, ..., 99. The
 location 00 is the location of the first word, 01 is the location of
 the second word, and so on.
 Before running an SML program, the Simpletron Simulator loads the
 programinto memory. The first instruction of every program is al‐
 ways placed in location 00. Each location in the Simpletron's mem‐
 ory may contain either an instruction, a data value used by a pro‐
 gram, or an unused (and hence undefined) area of memory.
 The registers
 The Simpletron has a single "general purpose" register known as the
 accumulator. Information must be put on the accumulator before the
 Simpletron uses that information in calculations or examines it in
 various ways.
 The Simpletron also has "special purpose" registers used to manage
 the instruction execution cycle. These registers cannot be changed
 directly.
 counter
 The instruction counter keep track of the locationin memory
 that contains the instruction being performed.
 instruction register
 The instruction register is a word containing the instruction
 currently being performed.
 opcode The opcode indicates the operation currently being performed.
 It is the leftmost two digits of the instruction currently
 being performed.
 operand
 The operand indicates the memory location or the immediate
 value on which the current instruction operates. It is the
 rightmost two digits of the instruction currently being per‐
 formed.
 The instruction execution cycle
 After the SML program has been loaded into the memory, the Sim‐
 pletron simulator executes it. It begins with the instruction in
 location 00 and continues sequentially, unless directed to some
 other part of the program by a transfer of control.
 The instruction execution cycle do as the following.
 The instruction counter tell the location of the next in‐
 struction to be performed.
 The contents of that location is fetched from memory into the
 instruction register.
 The operation code and the operand are extracted from the in‐
 struction register.
 The simpletron determines the operation to be executed.
 At this point, the simulation of a instruction is completed.
 All that remains is to prepare the Simpletron to execute the
 next instruction. So the Simpletron ajust the instruction
 counter accordingly.
THE SIMPLETRON MACHINE LANGUAGE
 Each instruction written in the Simpletron Machine Language (SML)
 occupies one word of the Simpletron's memory, so instructions are
 signed four-digit decimal numbers. We assume that the sign of an
 SML instruction is always plus, but the sign of a data word may be
 either plus or minus. An instruction is a plus-signed 4-digit word
 composed of two parts: the 2-digit operation code (aka "opcode") and
 the 2-digit operand.
 The first two digits of each SML instruction are the operation code,
 which specifies the operation to be performed. SML operation codes
 are summarized in the following sections between parentheses.
 The last two digits of an SML instruction are the operand, which is
 either the address of the memory location containing the word to
 which the operation indirectly applies, or a value to which the op‐
 eration directly applies.
 In a SML file, each line is a instruction, a instruction begins with
 a plus or minus sign followed by four decimal digits. The remaining
 of the line is ignored.
 Input/output operations
 READ (10)
 Read a word from the terminal into a specific location in
 memory.
 WRITE (11)
 Write a word from a specific location in memory to the termi‐
 nal.
 Memory loading/storing
 LOAD (20)
 Loada word from a specific location in memory into the accu‐
 mulator.
 STORE (21)
 Store a word from the accumulator into a specific location in
 memory.
 Memory arithmetic operations
 Note that all the results are left in accumulator.
 ADD (30)
 Add a word from a specific location in memory to the word in
 the accumulator.
 SUBTRACT (31)
 Subtract a word from a specific location in memory from the
 word in the accumulator.
 DIVIDE (32)
 Divide a word from a specific location in memory into the
 word in the accumulator.
 MULTIPLY (33)
 Multiply a word from a specific location in memory by the
 word in the accumulator.
 Immediate arithmetic operations
 Note that all the results are left in accumulator.
 ADD_I (40)
 Add a the value in operand to the word in the accumulator.
 SUBTRACT_I (41)
 Subtract the value in operand from the word in the accumula‐
 tor.
 DIVIDE_I (42)
 Divide the value in operand into the word in the accumulator.
 MULTIPLY_I (43)
 Multiply the value in operand by the word in the accumulator.
 Transfer-of-control operations
 BRANCH (50)
 Branch to a specific location in memory.
 BRANCHNEG (51)
 Branch to a specific location in memory if the accumulator is
 negative.
 BRANCHZERO (52)
 Branch to a specific location in memory if the accumulator is
 zero.
 HALT (53)
 Halt (i'e', the program has completed its task).
EXAMPLES
 The following are example of programs in the Simpletron Machine Lan‐
 guage (SML).
 adder.sml
 The following SML program reads two numbers from the keyboard and
 computes and prints their sum.
 +1007 READ A
 +1008 READ B
 +2007 LOAD A
 +3008 ADD B
 +2109 STORE C
 +1109 WRITE C
 +5300 HALT
 +0000 A
 +0000 B
 +0000 C
 (1) The instruction +1007 reads the first number from the keyboard
 and places it into location 07 (which has been initialized to zero).
 (2) Then +1008 reads the next number into location 08.
 (3) The load instruction (+2007) puts the first number into the ac‐
 cumulator.
 (4) The add instruction (+3008) adds the second number to the number
 in theaccumulator. All SML aritmetic instructions leave their re‐
 sults in the accumulator.
 (5) The store instruction (+2109) placesthe result back into memory
 location 09.
 (6) From the location 09, the write instruction (+1109) takes the
 number and prints it (as a signed four-digit decimal number).
 (7) The halt instruction (+4300) terminates the execution.
 larger.sml
 The following SML program reads two numbers from the keyboard, and
 determines and prints the larger value. Note the use of the in‐
 struction +5107 as a conditional transfer of control, much the same
 as C's if statement.
 +1009 READ A
 +1010 READ B
 +2009 LOAD A
 +3110 SUBTRACT B
 +5107 BRANCHNEG 07
 +1109 WRITE A
 +5300 HALT
 +1110 WRITE B
 +5300 HALT
 sum.sml
 The following program uses a sentinel-controlled loop to read posi‐
 tive integers and compute and printe their sum.
 +1008 READ A
 +2008 LOAD A
 +5206 BRANCHZERO 06
 +3009 SUM B
 +2109 STORE B
 +5000 BRANCH 00
 +1109 WRITE B
 +5300 HALT
 +0000 A
 +0000 B
 average7.sml
 The following program uses a counter-controlled loop to read seven
 numbers, some positive and some negative, and compute and print
 their average.
 +2015 LOAD N
 +5210 BRANCHZERO 10
 +1016 READ A
 +2016 LOAD A
 +3017 ADD B
 +2117 STORE B
 +2015 LOAD N
 +4101 SUB_I 01
 +2115 STORE N
 +5000 BRANCH 00
 +2017 LOAD B
 +4207 DIV_I 07
 +2117 STORE B
 +1117 WRITE B
 +5300 HALT
 +0007 N
 +0000 A
 +0000 B
EXIT STATUS
 0 Success.
 >0 Error occurred.
HISTORY
 This version of simpletron, the Simpletron Simulator, is based on
 the exercises 7.27~7.29 from the [Deitel & Deitel] book.
 The immediate operations are unique to this implementation, since
 the exercise does not mention them.
SEE ALSO
 [Deitel & Deitel]
 C: How to Program (8th edition), Paul Deitel and Harvey Dei‐
 tel
 simpletron(6)

And here is the Simpletron Simulator:

#include <err.h>
#include <errno.h>
#include <ctype.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#include <string.h>
#include <limits.h>
#include <unistd.h>
#define DEFMEMSIZE 100
#define MEM_MAX 9999
#define MEM_MIN -9999
#define INSTRUCTIONSIZE 4
#define OPSIZE 2
enum operation {
 READ = 10,
 WRITE = 11,
 LOAD = 20,
 STORE = 21,
 ADD = 30,
 SUBTRACT = 31,
 DIVIDE = 32,
 MULTIPLY = 33,
 REMINDER = 34,
 ADD_I = 40,
 SUBTRACT_I = 41,
 DIVIDE_I = 42,
 MULTIPLY_I = 43,
 REMINDER_I = 44,
 BRANCH = 50,
 BRANCHNEG = 51,
 BRANCHZERO = 52,
 HALT = 53
};
/* Simpletron's memory is simulated with a one-dimensional array */
static int *memory;
static int memsize = DEFMEMSIZE;
/* Simpletron's registers are simulated with the following variables */
static int acc; /* accumulator register (value being processed) */
static int ireg; /* instruction register (current instruction) */
static int simpletron(void);
static void load(FILE *);
static void dump(void);
static int getinstruction(FILE *, int *);
static int getmemsize(const char *s);
static void usage(void);
/* load a program in the Simpletron Machine Language into memory and execute it*/
int
main(int argc, char *argv[])
{
 int c, exitval, coredump;
 FILE *fp;
 coredump = 0;
 while ((c = getopt(argc, argv, "cm:")) != -1) {
 switch (c) {
 case 'm':
 if ((memsize = getmemsize(optarg)) < 1)
 errx(EXIT_FAILURE, "%s: improper memory size", optarg);
 break;
 case 'c':
 coredump = 1;
 break;
 default:
 usage();
 break;
 }
 }
 argc -= optind;
 argv += optind;
 if (argc != 1)
 usage();
 if ((memory = calloc(memsize, sizeof *memory)) == NULL)
 err(EXIT_FAILURE, NULL);
 if ((fp = fopen(*argv, "r")) == NULL)
 err(EXIT_FAILURE, "%s", *argv);
 load(fp); /* load program into memory */
 exitval = simpletron(); /* execute instructions in memory */
 if (coredump)
 dump(); /* do coredump, if needed */
 free(memory);
 return exitval;
}
/* run instructions from memory; return 1 if error occurs, return 0 otherwise */
static int
simpletron(void)
{
 static int count;
 static int opcode;
 static int operand;
 /* memory location of next instruction */
 /* simulation begins with the instruction in the location 00 and continues sequentially */
 count = 0;
 /* this loop implements the "instruction execution cycle" */
 while (count < memsize) {
 ireg = memory[count];
 opcode = ireg / 100; /* opcode is the leftmost two digits of instruction register*/
 operand = ireg % 100; /* operand is the rightmost two digits of instruction register*/
 /* this switch statement determine the operation to be performed */
 /* each case set the counter for next instruction accordingly */
 switch (opcode) {
 case READ:
 if (getinstruction(stdin, &memory[operand]) == 0) {
 warnx("improper input");
 return 1;
 }
 count++;
 break;
 case WRITE:
 printf("%+05d\n", memory[operand]);
 count++;
 break;
 case LOAD:
 acc = memory[operand];
 count++;
 break;
 case STORE:
 memory[operand] = acc;
 count++;
 break;
 case ADD:
 if ((memory[operand] > 0 && acc > MEM_MAX - memory[operand]) ||
 (memory[operand] < 0 && acc < MEM_MIN - memory[operand])) {
 warnx("integer overflow");
 return 1;
 }
 else
 acc += memory[operand];
 count++;
 break;
 case SUBTRACT:
 if ((memory[operand] > 0 && acc < MEM_MIN + memory[operand]) ||
 (memory[operand] < 0 && acc > MEM_MAX + memory[operand])) {
 warnx("integer overflow");
 return 1;
 }
 else
 acc -= memory[operand];
 count++;
 break;
 case DIVIDE:
 if (memory[operand] == 0) {
 warnx("division by zero");
 return 1;
 } else if ((acc == MEM_MIN) && (memory[operand] == -1)) {
 warnx("signed integer overflow");
 return 1;
 } else {
 acc /= memory[operand];
 }
 count++;
 break;
 case MULTIPLY:
 acc *= memory[operand];
 if (acc < MEM_MIN || acc > MEM_MAX) {
 warnx("integer overflow");
 return 1;
 }
 count++;
 break;
 case REMINDER:
 if (memory[operand] == 0) {
 warnx("remainder by zero");
 return 1;
 } else if ((acc == MEM_MIN) && (memory[operand] == -1)) {
 warnx("signed integer overflow");
 return 1;
 } else {
 acc %= memory[operand];
 }
 count++;
 break;
 case ADD_I:
 if ((operand > 0 && acc > MEM_MAX - operand) ||
 (operand < 0 && acc < MEM_MIN - operand)) {
 warnx("integer overflow");
 return 1;
 } else {
 acc += operand;
 }
 count++;
 break;
 case SUBTRACT_I:
 if ((operand > 0 && acc < MEM_MIN + operand) ||
 (operand < 0 && acc > MEM_MAX + operand)) {
 warnx("integer overflow");
 return 1;
 } else {
 acc -= operand;
 }
 count++;
 break;
 case DIVIDE_I:
 if (operand == 0) {
 warnx("division by zero");
 return 1;
 } else if ((acc == MEM_MIN) && (operand == -1)) {
 warnx("signed integer overflow");
 return 1;
 } else {
 acc /= operand;
 }
 count++;
 break;
 case MULTIPLY_I:
 acc *= operand;
 if (acc < MEM_MIN || acc > MEM_MAX) {
 warnx("integer overflow");
 return 1;
 }
 count++;
 break;
 case REMINDER_I:
 if (operand == 0) {
 warnx("remainder by zero");
 return 1;
 } else if ((acc == MEM_MIN) && (operand == -1)){
 warnx("signed integer overflow");
 return 1;
 } else {
 acc %= operand;
 }
 count++;
 break;
 case BRANCH:
 count = operand;
 break;
 case BRANCHNEG:
 if (acc < 0)
 count = operand;
 else
 count++;
 break;
 case BRANCHZERO:
 if (acc == 0)
 count = operand;
 else
 count++;
 break;
 case HALT:
 return 0;
 default:
 warnx("%+05d: invalid instruction", ireg);
 return 1;
 }
 }
 warnx("execution reached end of memory without halting");
 return 1;
}
/* load memory from file */
static void
load(FILE *fp)
{
 size_t i;
 int instruction;
 i = 0;
 while(getinstruction(fp, &instruction) && i < memsize)
 memory[i++] = instruction;
}
/* write a core dump of memory and registers into stdout */
static void
dump(void)
{
 size_t i, j;
 fprintf(stderr, "\nREGISTERS:\n"
 "accumulator %+05d\n"
 "instruction register %+05d\n",
 acc, ireg);
 fprintf(stderr, "\nMEMORY:\n"
 " 0 1 2 3 4 5 6 7 8 9\n");
 for (i = 0; i < memsize / 10; i++) {
 fprintf(stderr, "%2lu ", i * 10);
 for (j = 0; j < memsize / 10; j++)
 fprintf(stderr, "%+05d%s", memory[(i*10)+j],
 (j == memsize / 10 - 1) ? "\n" : " ");
 }
}
/* get instruction from fp; return 0 if instruction is improper */
static int
getinstruction(FILE *fp, int *instruction)
{
 size_t i;
 int c, num, sign;
 num = 0;
 /* get initial blank */
 while (isblank(c = getc(fp)))
 ;
 /* get instruction/data sign */
 sign = (c == '-') ? -1 : 1;
 if (c != '+' && c != '-')
 return 0;
 else
 c = getc(fp);
 /* get instruction/data number */
 for (i = 0; i < INSTRUCTIONSIZE; i++) {
 if (!isdigit(c))
 return 0;
 num = num * 10 + c - '0';
 c = getc(fp);
 }
 /* get remaining of command line */
 while (c != '\n' && c != EOF)
 c = getc(fp);
 *instruction = sign * num;
 return 1;
}
/* get an integer from s to be used as the memory size */
static int
getmemsize(const char *s)
{
 long n;
 char *endp;
 n = strtol(s, &endp, 10);
 if (errno == ERANGE || n > INT_MAX || n < INT_MIN || endp == s || *endp != '0円')
 return -1;
 return (int) n;
}
static void
usage(void)
{
 (void) fprintf(stderr, "usage: simpletron [-c] [-m memsize] file\n");
 exit(EXIT_FAILURE);
}

Here is a sample program in the Simpletron Machine Language, average7.sml, it receives 7 values from input and calculates the average between them.

+1008
+2008
+5206
+3009
+2109
+5000
+1109
+5300
+0000
+0000

The input of a Simpletron program must be a signed four-digit integer, like +0007 or -0001.

Is there any way I can make the code more elegant and portable?

Reinderien
71k5 gold badges76 silver badges256 bronze badges
asked Mar 26, 2020 at 1:52
\$\endgroup\$
3
  • \$\begingroup\$ (I had a look at the troff "source" without syntax colouring - did not find it much improved.) \$\endgroup\$ Commented Mar 26, 2020 at 5:11
  • 1
    \$\begingroup\$ It might be more helpful if you displayed that help text in your console using man and pasted the formatted output here. It seems like an unnecessary hoop to jump through for reviewers otherwise. Unless the code of the man page is explicitly included in what you want reviewed, of course. \$\endgroup\$ Commented Mar 26, 2020 at 8:20
  • \$\begingroup\$ In the review below the section Assignment-in-condition is addressing code readability and maintainability, and it section is correct, your code is not as readable as it could be and someone could definitely miss the assignment statements. \$\endgroup\$ Commented Mar 26, 2020 at 15:47

2 Answers 2

5
\$\begingroup\$

In addition to the review you already have, I have a few more suggestions.

Fix the bug

As was already pointed out, the assignment-in-condition practice is problematic. In this particular case, the problem is in main. The current code has this:

if ((memory = calloc(memsize, sizeof *memory)) == NULL)
 err(EXIT_FAILURE, NULL);
if ((fp = fopen(*argv, "r")) == NULL)
 err(EXIT_FAILURE, "%s", *argv);

The problem is that if the file doesn't exist, the memory just allocated will not be freed. For that reason and for the fact that it's generally better to define variables when they are declared, I'd write that sequence like this instead:

FILE *fp = fopen(*argv, "r");
if (fp == NULL) {
 free(memory);
 err(EXIT_FAILURE, "%s", *argv);
}

Think carefully about signed vs. unsigned numbers

What would it mean for the memsize to be a negative number? I can't think of a rational interpretation for such a thing, so I'd recommend making that size_t which is unsigned.

Eliminate global variables

In this case there lot of global variables such as memory and memsize which are probably better gathered up into a structure and made part of main instead of global. Then for each of the relevant functions such as load or dump, pass a pointer to the structure as one of the arguments.

struct Simpletron {
 /* Simpletron's memory is simulated with a one-dimensional array */
 int *memory;
 size_t memsize;
 /* Simpletron's registers are simulated with the following variables */
 int acc; /* accumulator register (value being processed) */
 size_t pc; /* program counter points to current instruction */
 int opcode; /* current opcode */
 int operand; /* current operand */
};

Note that I've also changed from ireg to pc. More on that later.

Make the program data driven

Instead of the operation enum, a large switch statement, etc. I think it would be much neater to have a struct for instructions. Here's how I'd define it:

struct Instruction {
 int opcode;
 const char *mnemonic;
 const char *printstr;
 int (*exec)(struct Simpletron* s);
};

Now we can create an array of instructions. Here's an example of one:

{ 52,"BRANCHZERO"," %2u", simple_instr_branchzero },

Now all that remains is to write the code that performs the instruction:

static int simple_instr_branchzero(struct Simpletron *s) {
 if (s->acc == 0) {
 s->pc = s->operand;
 } else {
 ++s->pc;
 }
 return WARN_NONE;
}

Make error messages and numbers neater

You may have noticed that the function above returns WARN_NONE. This is somewhat easier for a programmer to read and understand than something like return 0 and also has the advantage that we now have both a code and a message (which might be translated to other languages, for instance). So instead of this inside the large switch:

case ADD_I:
 if ((operand > 0 && acc > MEM_MAX - operand) ||
 (operand < 0 && acc < MEM_MIN - operand)) {
 warnx("integer overflow");
 return 1;
 } else {
 acc += operand;
 }
 count++;
 break;

We can write this:

static int simple_instr_add_i(struct Simpletron *s) {
 int result = s->acc + s->operand;
 if (isOutOfRange(result)) {
 return WARN_OVERFLOW;
 }
 s->acc = result;
 ++s->pc;
 return WARN_NONE;
}

This is enabled using this code:

enum warning { WARN_NONE, WARN_HALT, WARN_INPUT, WARN_OVERFLOW, WARN_DIVZERO, WARN_SIGNEDOVERFLOW, WARN_REMAINZERO, WARN_COUNT };
static const struct Error {
 enum warning value;
 const char *text;
} simpletron_errors[WARN_COUNT] = {
 { WARN_NONE, "ok" },
 { WARN_HALT, "halt" },
 { WARN_INPUT, "improper input" },
 { WARN_OVERFLOW, "integer overflow" },
 { WARN_DIVZERO, "division by zero" },
 { WARN_SIGNEDOVERFLOW, "signed integer overflow"},
 { WARN_REMAINZERO, "remainder by zero"},
};

Note that WARN_COUNT is not a real warning, but rather a marker to define the size of the array and also for us with error checking on access to that array.

Use helper functions to clarify code

The code above uses isOutOfRange which simplifies the code and makes it clear to the reader. The content is this:

static bool isOutOfRange(int n) {
 return n < MEM_MIN || n > MEM_MAX;
}

Use action words for functions

The functions load and dump are named in a way that suggests their function, but I think simpletron is not as good. Since they are all dealing with the same underlying machine, I'd suggest naming them as simpletron_load, simpletron_dump and simpletron_run.

Separate interface from implementation

I'd suggest splitting the program into three pieces: main.c which would contain main and functions only needed by it, a simpletron.h file that defines the interface to the virtual machine and simpletron.c which would contain the implementation. Here's how I would define simpletron.h:

#ifndef SIMPLETRON_H
#define SIMPLETRON_H
#include <stdio.h>
#include <stdbool.h>
struct Simpletron {
 /* Simpletron's memory is simulated with a one-dimensional array */
 int *memory;
 size_t memsize;
 /* Simpletron's registers are simulated with the following variables */
 int acc; /* accumulator register (value being processed) */
 size_t pc; /* program counter points to current instruction */
 int opcode; /* current opcode */
 int operand; /* current operand */
};
int simpletron_run(struct Simpletron *s, bool trace, bool verbose);
int simpletron_load(struct Simpletron *s, FILE *fp);
void simpletron_dump(struct Simpletron *s);
#endif // SIMPLETRON_H

Only the minimal information to use the interface is here. All of the other details are encapsulated in simpletron.c.

Prefer const to #define

Since C99, it's generally better to use const rather than #define for numerical constants. For instance, I'd put these inside simpletron.c:

static const int MEM_MAX = 9999;
static const int MEM_MIN = -9999;
static const int INSTRUCTIONSIZE = 4;

This way, we get the benefit of type checking and also limiting scope.

Consider adding features

I thought it would be nice to be able to trace the program and also, optionally, to dump the contents of the machine after each instruction. This heavily modified version of your original simpletron function does just that.

/* run instructions from memory; return 1 if error occurs, return 0 otherwise */
int simpletron_run(struct Simpletron *s, bool trace, bool verbose) {
 /* memory location of next instruction */
 /* simulation begins with the instruction in the location 00 and continues sequentially */
 s->pc = 0;
 /* this loop implements the "instruction execution cycle" */
 while (s->pc < s->memsize) {
 /* opcode is the leftmost two digits of instruction register*/
 s->opcode = s->memory[s->pc] / 100;
 /* operand is the rightmost two digits of instruction register*/
 s->operand = s->memory[s->pc] % 100;
 /* simple linear scan for opcode */
 const struct Instruction *op = findop(s->opcode);
 if (op == NULL) {
 warnx("%+05d: invalid instruction", s->memory[s->pc]);
 return 1;
 }
 if (trace) {
 fprintf(stderr, "%05lu: %+05d\t", s->pc, s->memory[s->pc]);
 fprintf(stderr, op->mnemonic);
 fprintf(stderr, op->printstr, s->operand);
 fprintf(stderr, "\n");
 }
 int result = op->exec(s);
 if (verbose) {
 simpletron_dump(s);
 }
 if (result == WARN_HALT) {
 return 0;
 }
 if (result != WARN_NONE && result < WARN_COUNT) {
 warnx(simpletron_errors[result].text);
 return 1;
 }
 }
 warnx("execution reached end of memory without halting");
 return 1;
}

Using these features was a simple matter of adding the appropriate arguments for main and passing two boolean values. Much of this functionality is enabled by the use of the data-driven design, but there's still more.

Fully use data structures to simplify features

The posted example code purports to take an average of seven numbers, but it does no such thing. In fact, it computes a sum of a list of numbers terminated by a sentinel value of zero. A program that computes an average might look like this in source code form:

READ [13] ; read a number from the uset
LOAD [13] ; acc = number
ADD [15] ; add to running sum
STORE [15] ; store sum
LOAD [14] ; fetch counter
ADD_I 1 ; increment by one
STORE [14] ; save updated count
BRANCHNEG 0 ; if <0, we're not done yet
LOAD [15] ; fetch the running sum
DIVIDE_I 7 ; divide by seven
STORE [13] ; store the updated value
WRITE [13] ; write it to stdout
HALT
+0000 ; this is location 13 used as a scratchpad for input
-0007 ; this is the value -n (number of numbers to avg)
+0000 ; this is location 15 that holds the running sum

It was certainly not obvious from a raw list of numbers what the original code actually did until I added the trace function mentioned above. It's a relatively simple task to allow the code to accept either this nice source code version or the original raw number version. Here's an enhanced simpletron_load function that does just that:

int simpletron_load(struct Simpletron *s, FILE *fp) {
 unsigned linenum = 1;
 char inst[13];
 inst[12] = '0円'; // assure it's terminated
 size_t i = 0;
 const char* error = NULL;
 while (!error && (fscanf(fp, "%12s", inst) == 1)) {
 // is it a number
 if (inst[0] == '+' || inst[0] == '-') {
 int arg;
 if (sscanf(inst, "%5d", &arg) == 1) {
 s->memory[i++] = arg;
 } else {
 error = "reading number";
 }
 } else {
 const struct Instruction *in = findmnemonic(inst);
 if (in) {
 if (strlen(in->printstr)) {
 int arg = parsearg(in->printstr, fp);
 if (arg >= 0) {
 s->memory[i++] = in->opcode*100 + arg;
 } else {
 error = "reading instruction";
 }
 } else {
 s->memory[i++] = in->opcode*100;
 }
 } else {
 error = "reading instruction";
 }
 }
 // ignore the rest of the line
 for (int c = getc(fp); c != '\n' && c != EOF; c = getc(fp)) {
 }
 ++linenum;
 if (i >= s->memsize) {
 error = "memory full";
 }
 }
 if (error) {
 printf("ERROR:%s:%d\n", error, linenum);
 return 1;
 }
 return 0;
}

Make the machine do the counting

Rather than fiddling with tedious parsing of formatted output, I generally prefer to let the machine do the counting for me. To that end, the rewritten simpletron_dump function now looks like this:

/* write a core dump of memory and registers into stdout */
void simpletron_dump(struct Simpletron *s) {
 const unsigned linelen = 10;
 fprintf(stderr, "\nREGISTERS:\n"
 "accumulator %+05d\n"
 "instruction pointer +%04lu\n"
 "\nMEMORY:\n ",
 s->acc, s->pc);
 for (unsigned i = 0; i < linelen; ++i) {
 fprintf(stderr, "%7u", i);
 }
 unsigned dumpcount = 0;
 for (size_t i = 0; i < s->memsize; ++i, --dumpcount) {
 if (dumpcount == 0) {
 fprintf(stderr, "\n%2lu ", i );
 dumpcount = linelen;
 }
 fprintf(stderr, "%+05d ", s->memory[i]);
 }
 fprintf(stderr, "\n");
}

The code uses the const unsigned linelen to keep track of how many values to print out per line for both the header and for the memory dump. That also fixes another bug in the original which did not print memory correctly.

Understand real CPUs

I know this is all a learning exercise, but one thing that may be useful is to understand a bit more about real CPU architecture. For example, rather than throwing an error in the event of overflow, real CPUs typically have a carry flag which indicates this and an overflow flag to indicate signed overflow. Also, it is more typical to have an instruction pointer (sometimes called a program counter) rather than an instruction register that actually holds the current instruction. Of course internal to the machine, something eventually does fetch and parse the value of the instruction, but it's quite rare for that to be directly accessible from the outside. This was one reason I changed from ireg to pc as mentioned above. It keeps things neater and more clearly mimics how real machines work.

Results

Here's the revised version of simpletron.c:

#include "simpletron.h"
#include <err.h>
#include <errno.h>
#include <ctype.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#include <string.h>
#include <limits.h>
#include <unistd.h>
static const int MEM_MAX = 9999;
static const int MEM_MIN = -9999;
static const int INSTRUCTIONSIZE = 4;
enum warning { WARN_NONE, WARN_HALT, WARN_INPUT, WARN_OVERFLOW, WARN_DIVZERO, WARN_SIGNEDOVERFLOW, WARN_REMAINZERO, WARN_COUNT };
static const struct Error {
 enum warning value;
 const char *text;
} simpletron_errors[WARN_COUNT] = {
 { WARN_NONE, "ok" },
 { WARN_HALT, "halt" },
 { WARN_INPUT, "improper input" },
 { WARN_OVERFLOW, "integer overflow" },
 { WARN_DIVZERO, "division by zero" },
 { WARN_SIGNEDOVERFLOW, "signed integer overflow"},
 { WARN_REMAINZERO, "remainder by zero"},
};
static bool isOutOfRange(int n) {
 return n < MEM_MIN || n > MEM_MAX;
}
/* get instruction from fp; return 0 if instruction is improper */
static int fetch_number(FILE *fp, int *instruction) {
 int num = 0;
 int c;
 int sign = 1;
 /* get initial blank */
 while (isblank(c = getc(fp)))
 ;
 /* get instruction/data sign */
 switch (c) {
 case '-':
 sign = -1;
 // fall through
 case '+':
 c = getc(fp);
 break;
 default: // error condition
 return 0;
 }
 /* get instruction/data number */
 for (int i = INSTRUCTIONSIZE; i; --i) {
 if (!isdigit(c)) { // error
 return 0;
 }
 num = num * 10 + c - '0';
 c = getc(fp);
 }
 /* get remaining of command line */
 while (c != '\n' && c != EOF) {
 c = getc(fp);
 }
 *instruction = sign * num;
 return 1;
}
static int simple_instr_read(struct Simpletron *s) {
 if (fetch_number(stdin, &s->memory[s->operand]) == 0) {
 return WARN_INPUT;
 }
 ++s->pc;
 return WARN_NONE;
}
static int simple_instr_write(struct Simpletron *s) {
 printf("%+05d\n", s->memory[s->operand]);
 ++s->pc;
 return WARN_NONE;
}
static int simple_instr_load(struct Simpletron *s) {
 s->acc = s->memory[s->operand];
 ++s->pc;
 return WARN_NONE;
}
static int simple_instr_store(struct Simpletron *s) {
 s->memory[s->operand] = s->acc;
 ++s->pc;
 return WARN_NONE;
}
static int simple_instr_add(struct Simpletron *s) {
 int result = s->acc + s->memory[s->operand];
 if (isOutOfRange(result)) {
 return WARN_OVERFLOW;
 }
 s->acc = result;
 ++s->pc;
 return WARN_NONE;
}
static int simple_instr_subtract(struct Simpletron *s) {
 int result = s->acc - s->memory[s->operand];
 if (isOutOfRange(result)) {
 return WARN_OVERFLOW;
 }
 s->acc = result;
 ++s->pc;
 return WARN_NONE;
}
static int simple_instr_divide(struct Simpletron *s) {
 if (s->memory[s->operand] == 0) {
 return WARN_DIVZERO;
 } else if ((s->acc == MEM_MIN) && (s->memory[s->operand] == -1)) {
 return WARN_SIGNEDOVERFLOW;
 } else {
 s->acc /= s->memory[s->operand];
 }
 ++s->pc;
 return WARN_NONE;
}
static int simple_instr_multiply(struct Simpletron *s) {
 s->acc *= s->memory[s->operand];
 if (isOutOfRange(s->acc)) {
 return WARN_OVERFLOW;
 }
 ++s->pc;
 return WARN_NONE;
}
static int simple_instr_remainder(struct Simpletron *s) {
 if (s->memory[s->operand] == 0) {
 return WARN_REMAINZERO;
 } else if ((s->acc == MEM_MIN) && (s->memory[s->operand] == -1)) {
 return WARN_SIGNEDOVERFLOW;
 } else {
 s->acc %= s->memory[s->operand];
 }
 ++s->pc;
 return WARN_NONE;
}
static int simple_instr_add_i(struct Simpletron *s) {
 int result = s->acc + s->operand;
 if (isOutOfRange(result)) {
 return WARN_OVERFLOW;
 }
 s->acc = result;
 ++s->pc;
 return WARN_NONE;
}
static int simple_instr_subtract_i(struct Simpletron *s) {
 int result = s->acc - s->operand;
 if (isOutOfRange(result)) {
 return WARN_OVERFLOW;
 }
 s->acc = result;
 ++s->pc;
 return WARN_NONE;
}
static int simple_instr_divide_i(struct Simpletron *s) {
 if (s->operand == 0) {
 return WARN_DIVZERO;
 } else if ((s->acc == MEM_MIN) && (s->operand == -1)) {
 return WARN_SIGNEDOVERFLOW;
 } else {
 s->acc /= s->operand;
 }
 ++s->pc;
 return WARN_NONE;
}
static int simple_instr_multiply_i(struct Simpletron *s) {
 s->acc *= s->operand;
 if (isOutOfRange(s->acc)) {
 return WARN_OVERFLOW;
 }
 ++s->pc;
 return WARN_NONE;
}
static int simple_instr_remainder_i(struct Simpletron *s) {
 if (s->operand == 0) {
 return WARN_REMAINZERO;
 } else if ((s->acc == MEM_MIN) && (s->operand == -1)){
 return WARN_SIGNEDOVERFLOW;
 } else {
 s->acc %= s->operand;
 }
 ++s->pc;
 return WARN_NONE;
}
static int simple_instr_branch(struct Simpletron *s) {
 s->pc = s->operand;
 return WARN_NONE;
}
static int simple_instr_branchneg(struct Simpletron *s) {
 if (s->acc < 0) {
 s->pc = s->operand;
 } else {
 ++s->pc;
 }
 return WARN_NONE;
}
static int simple_instr_branchzero(struct Simpletron *s) {
 if (s->acc == 0) {
 s->pc = s->operand;
 } else {
 ++s->pc;
 }
 return WARN_NONE;
}
static int simple_instr_halt(struct Simpletron *s) {
 s=s;
 return WARN_HALT;
}
static const struct Instruction {
 int opcode;
 const char *mnemonic;
 const char *printstr;
 int (*exec)(struct Simpletron* s);
} instructions[] = {
 { 10,"READ"," [%2u]", simple_instr_read },
 { 11,"WRITE"," [%2u]", simple_instr_write },
 { 20,"LOAD"," [%2u]", simple_instr_load },
 { 21,"STORE"," [%2u]", simple_instr_store },
 { 30,"ADD"," [%2u]", simple_instr_add },
 { 31,"SUBTRACT"," [%2u]", simple_instr_subtract },
 { 32,"DIVIDE"," [%2u]", simple_instr_divide },
 { 33,"MULTIPLY"," [%2u]", simple_instr_multiply },
 { 34,"REMAINDER"," [%2u]", simple_instr_remainder },
 { 40,"ADD_I"," %2u", simple_instr_add_i },
 { 41,"SUBTRACT_I"," %2u", simple_instr_subtract_i },
 { 42,"DIVIDE_I"," %2u", simple_instr_divide_i },
 { 43,"MULTIPLY_I"," %2u", simple_instr_multiply_i },
 { 44,"REMAINDER_I"," %2u", simple_instr_remainder_i },
 { 50,"BRANCH"," %2u", simple_instr_branch },
 { 51,"BRANCHNEG"," %2u", simple_instr_branchneg },
 { 52,"BRANCHZERO"," %2u", simple_instr_branchzero },
 { 53,"HALT","" , simple_instr_halt },
};
static const struct Instruction *findop(int opcode) {
 for (size_t i=0; i < sizeof(instructions)/sizeof(instructions[0]); ++i) {
 if (opcode == instructions[i].opcode) {
 return &instructions[i];
 }
 }
 return NULL;
}
static const struct Instruction *findmnemonic(const char *mnemonic) {
 for (size_t i=0; i < sizeof(instructions)/sizeof(instructions[0]); ++i) {
 if (strcmp(mnemonic, instructions[i].mnemonic) == 0) {
 return &instructions[i];
 }
 }
 return NULL;
}
/* run instructions from memory; return 1 if error occurs, return 0 otherwise */
int
simpletron_run(struct Simpletron *s, bool trace, bool verbose)
{
 /* memory location of next instruction */
 /* simulation begins with the instruction in the location 00 and continues sequentially */
 s->pc = 0;
 /* this loop implements the "instruction execution cycle" */
 while (s->pc < s->memsize) {
 /* opcode is the leftmost two digits of instruction register*/
 s->opcode = s->memory[s->pc] / 100;
 /* operand is the rightmost two digits of instruction register*/
 s->operand = s->memory[s->pc] % 100;
 /* simple linear scan for opcode */
 const struct Instruction *op = findop(s->opcode);
 if (op == NULL) {
 warnx("%+05d: invalid instruction", s->memory[s->pc]);
 return 1;
 }
 if (trace) {
 fprintf(stderr, "%05lu: %+05d\t", s->pc, s->memory[s->pc]);
 fprintf(stderr, op->mnemonic);
 fprintf(stderr, op->printstr, s->operand);
 fprintf(stderr, "\n");
 }
 int result = op->exec(s);
 if (verbose) {
 simpletron_dump(s);
 }
 if (result == WARN_HALT) {
 return 0;
 }
 if (result != WARN_NONE && result < WARN_COUNT) {
 warnx(simpletron_errors[result].text);
 return 1;
 }
 }
 warnx("execution reached end of memory without halting");
 return 1;
}
static int parsearg(const char *fmt, FILE *fp) {
 unsigned arg = 0;
 int result = fscanf(fp, fmt, &arg);
 return (result == 1) ? (int)arg : -1;
}
int simpletron_load(struct Simpletron *s, FILE *fp) {
 unsigned linenum = 1;
 char inst[13];
 inst[12] = '0円'; // assure it's terminated
 size_t i = 0;
 const char* error = NULL;
 while (!error && (fscanf(fp, "%12s", inst) == 1)) {
 // is it a number
 if (inst[0] == '+' || inst[0] == '-') {
 int arg;
 if (sscanf(inst, "%5d", &arg) == 1) {
 s->memory[i++] = arg;
 } else {
 error = "reading number";
 }
 } else {
 const struct Instruction *in = findmnemonic(inst);
 if (in) {
 if (strlen(in->printstr)) {
 int arg = parsearg(in->printstr, fp);
 if (arg >= 0) {
 s->memory[i++] = in->opcode*100 + arg;
 } else {
 error = "reading instruction";
 }
 } else {
 s->memory[i++] = in->opcode*100;
 }
 } else {
 error = "reading instruction";
 }
 }
 // ignore the rest of the line
 for (int c = getc(fp); c != '\n' && c != EOF; c = getc(fp)) {
 }
 ++linenum;
 if (i >= s->memsize) {
 error = "memory full";
 }
 }
 if (error) {
 printf("ERROR:%s:%d\n", error, linenum);
 return 1;
 }
 return 0;
}
/* write a core dump of memory and registers into stdout */
void simpletron_dump(struct Simpletron *s) {
 fprintf(stderr, "\nREGISTERS:\n"
 "accumulator %+05d\n"
 "instruction pointer +%04lu\n",
 s->acc, s->pc);
 fprintf(stderr, "\nMEMORY:\n ");
 const unsigned linelen = 10;
 for (unsigned i = 0; i < linelen; ++i) {
 fprintf(stderr, "%7u", i);
 }
 unsigned dumpcount = 0;
 for (size_t i = 0; i < s->memsize; ++i, --dumpcount) {
 if (dumpcount == 0) {
 fprintf(stderr, "\n%2lu ", i );
 dumpcount = linelen;
 }
 fprintf(stderr, "%+05d ", s->memory[i]);
 }
 fprintf(stderr, "\n");
}
```
answered Mar 26, 2020 at 21:23
\$\endgroup\$
6
  • \$\begingroup\$ I am in fact too much adapted with old idioms I learned from K&R and manpages. Do you recommend any book I can read to make me adapt to the modern programming practices and idioms? \$\endgroup\$ Commented Mar 26, 2020 at 22:25
  • \$\begingroup\$ I would recommend "Modern C" by Jens Gustedt. It is a very recent book published by Manning and looks like an excellent resource for learning the modern version of the language. \$\endgroup\$ Commented Mar 27, 2020 at 11:50
  • \$\begingroup\$ define variables when they are defined - you mean initialize when defined? \$\endgroup\$ Commented Mar 27, 2020 at 14:12
  • \$\begingroup\$ Yes, I meant define variables when they are declared. Corrected now, thanks! \$\endgroup\$ Commented Mar 27, 2020 at 14:14
  • \$\begingroup\$ For that conversion, I opened the file in two windows and cut and pasted the body of each case statement into each separate function. Then just do minimal modifications for converting to use the struct Simpletron and you're done. Tedious, perhaps, but not hard. Let me know if I can answer any questions or help. \$\endgroup\$ Commented Mar 27, 2020 at 20:49
7
\$\begingroup\$

Re-entrance

These:

static int *memory;
static int acc; /* accumulator register (value being processed) */
static int ireg; /* instruction register (current instruction) */
// ...
 static int count;
 static int opcode;
 static int operand;

force a user to start a new program if they want a new instance of the calculator. If you want to offer an API that allows the co-existence of multiple calculators, pass around a struct instead.

C99

These:

int c, exitval, coredump;
FILE *fp;

haven't needed declaration at the beginning of the function for 20-ish years. It's more legible for them to be declared and initialized closer to where they're actually being used in the function.

Assignment-in-condition

About these various statements -

while ((c = getopt(argc, argv, "cm:")) != -1) {
if ((memory = calloc(memsize, sizeof *memory)) == NULL)
if ((fp = fopen(*argv, "r")) == NULL)
while (isblank(c = getc(fp)))

Don't, please. Expand this out so that the variable is assigned in its own statement. The above is confusing and error-prone, and has no performance gains. The only thing it's good for is code golf, which you aren't currently playing.

Addition efficiency

 if ((memory[operand] > 0 && acc > MEM_MAX - memory[operand]) ||
 (memory[operand] < 0 && acc < MEM_MIN - memory[operand])) {
 warnx("integer overflow");
 return 1;
 }
 else
 acc += memory[operand];

can become something like

int sum = memory[operand] + acc;
if (sum > MEM_MAX || sum < MEM_MIN) {
 warnx("integer overflow");
 return 1;
}
acc = sum;

In other words: don't do the addition three times; do it once. The same applies to SUBTRACT.

Order of operations

((acc == MEM_MIN) && (memory[operand] == -1))

doesn't require inner parens, due to operator precedence.

Typo

REMINDER should be REMAINDER.

Loop sanity

 size_t i;
 i = 0;
 while(getinstruction(fp, &instruction) && i < memsize)
 memory[i++] = instruction;

is better represented by

for (size_t i = 0; i < memsize; i++) {
 if (!getinstruction(fp, &instruction))
 break;
 memory[i] = instruction;
}

Memory efficiency

Currently you're storing integers in 32 bits that, since they have values of less than 10,000, could fit in 16. Depending on your constraints - whether you're optimizing for execution speed or memory efficiency - you might want to change this. 16 bits might actually be slower on your architecture, but to be sure you'd want to profile. Also, if you ever plan on serializing the state of the machine into a file, you should use 16 bits (int16_t from stdint.h).

answered Mar 26, 2020 at 2:57
\$\endgroup\$
9
  • \$\begingroup\$ For the addition, I was using SEI's CERT INT32-C to ensure integer overflow. The assignment-in-condition is a hint that survived from when I read K&R book, they use assignment-in-condition all the time. Declaration at the beginning of function was also from K&R. I am reading Deitel's C:How to Program after have read K&R, is this a good path choice? I think Deitel's codes are much less elegant than K&R's... \$\endgroup\$ Commented Mar 26, 2020 at 9:12
  • \$\begingroup\$ Also, the assignments-in-condition (particularly the one for getopt(3)) are idioms I've learnt from OpenBSD's manpages. I think assignment-in-conditions are not out of fashion in operating system development cycles. \$\endgroup\$ Commented Mar 26, 2020 at 9:16
  • \$\begingroup\$ Re. K&R - the most recent (2nd) edition is from 1988. C99 should be used by everyone at this point. The classics are classic, but they need to be taken with a grain of salt and some understanding of the modern world. \$\endgroup\$ Commented Mar 26, 2020 at 14:38
  • 1
    \$\begingroup\$ I was using SEI's CERT INT32-C to ensure integer overflow - it's excellent for you to be thinking about those details, but I fear that the situation is more complicated. Modern optimizing compilers are able to move terms across the sides of an inequality, for instance. That aside, since you're currently using a 32-bit representation, the data range is significantly narrower than the register range, so you do not have to worry about these details. Even if you switched to 16-bit, you would still be safe, because 2*10^4 < 2^15. \$\endgroup\$ Commented Mar 26, 2020 at 14:48
  • 1
    \$\begingroup\$ Also, it does not surprise me that man pages and operating system code both show signs of predating C99, since the BSD kernel certainly happened first, and it's entirely likely that it hasn't switched styles. \$\endgroup\$ Commented Mar 26, 2020 at 14:53

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.