Abstracting device IO / glue logic in a CPU emulator

Question 1

In my Motorola 6809 CPU emulator (written in C++14) I'm working on an abstraction to represent the wiring between emulated devices, such as the ports on a 6522 VIA, or the interrupt lines connecting a peripheral to the CPU.

The abstraction uses functional composition. Each OutputPin encapsulates a developer-supplied function that returns the current state of that output. InputPin objects belonging to a different device have a copy of the OutputPin to which they are attached. Each InputPin may only be attached to a single OutputPin, but an OutputPin may be attached to multiple InputPins.

This abstraction is working, and allows for expressive and powerful constructions like this, which does a "wired-and" of three IRQ lines (where one of them is active-high) and attaches it to the CPU's IRQ input and each time the state of that line is tested the composed functions generate the correct result based on the current state of the three inputs:

cpu.IRQ << (!fdc.DIRQ & via.IRQ & acia.IRQ);

and this, which generates an output signal that is the parity of 8 other outputs.

OutputPin parity = out[0] ^ out[1] ^ out[2] ^ out[3] ^ out[4] ^ out[5] ^ out[6] ^ out[7];

My concern, though, is performance. I had hoped that compilers might be able to elide many of the function calls and collapse them into simpler expressions but initial tests with clang 12.0 on macOS don't appear to show any sign of this.

Is there anything I could be doing to improve the run-time efficiency, or am I hoping for too much from the compiler's optimiser?

Here's the complete implementation as a header with inline functions:

static inline bool default_true() {
 return true;
}
class OutputPin {
public:
 using Function = std::function<bool()>;
protected:
 Function f = default_true;
public:
 OutputPin() { }
 OutputPin(const Function& f) : f(f) { }
 void bind(const Function& _f) {
 f = _f;
 }
 operator bool() const {
 return f();
 }
 OutputPin operator !() const {
 return OutputPin([&]() {
 return !f();
 });
 }
};
class InputPin {
protected:
 OutputPin input;
public:
 void attach(const OutputPin& _input) {
 input = _input;
 }
 operator bool() const {
 return input;
 }
};
inline void operator<<(InputPin& in, const OutputPin& out)
{
 in.attach(out);
}
inline OutputPin operator&(const OutputPin& a, const OutputPin& b)
{
 return OutputPin([=]() {
 return (bool)a && (bool)b;
 });
}
inline OutputPin operator|(const OutputPin& a, const OutputPin& b)
{
 return OutputPin([=]() {
 return (bool)a || (bool)b;
 });
}
inline OutputPin operator^(const OutputPin& a, const OutputPin& b)
{
 return OutputPin([=]() {
 return (bool)a ^ (bool)b;
 });
}

and here's a trivial test case for the above header (but which generates 60kB of assembler output!):

#include "device.h"
int main()
{
 OutputPin a, b, c;
 InputPin i;
 
 i << (!a & b & c);
 
 return i;
}

Question 2

Overhead of `std::function`

The main problem is that you are using std::function, and this comes with some overhead. In particular, it will allocate storage (using new internally) since your lambdas are capturing variables. Since memory allocations can have side effects (they might even throw exceptions if memory could not be allocated), the compiler cannot optimize them away.

Consider just storing the state of a pin as a `bool`

The state of an output pin is just true or false. Instead of storing a function that calculates the state, consider just storing the current state in a bool. It seems unlikely to me that always ensuring the state of the pin is set correctly is less efficient than having a function that calculates it whenever you need the value. An InputPin should then not store a copy of an OutputPin, but either a reference to an OutputPin, or if you want to still be able to have an expression assigned to an InputPin, have it store the function that calculates the input value. Once you do this, everything simplifies enormously, and the compiler will have no trouble optimizing this code.

Here is an example:

#include <functional>
using OutputPin = bool;
template<typename Function>
class InputPin {
 Function f;
public:
 InputPin(const Function &f): f(f) {}
 operator bool() const {
 return f();
 }
};
int main()
{
 OutputPin a(true), b(true), c(true);
 InputPin i([&]{return !a & b & c;});
 
 return i;
}

Question 3

I see where you're coming from, but this doesn't allow for the use case in my first example, where the InputPin and OutputPin variables are actually member variables of the C++ classes representing the various chip devices. If I was able to post-creation assign the InputPin function outside of the containing class's implementation that might work, though,

Question 4

also to elaborate further - the OutputPin variables are part of the public interface of a chip and are read only but the state of the pin itself is part of its private state. Imagine also something like a 6522 where a write of a byte to ORB can change 8 output pins at once.

Question 5

Hm, you could still make OutputPin a class with an operator() to get its state I guess, and then make it a template like I did for InputPin in the above example. Then the declaration of the InputPin in main() becomes: InputPin i([&]{return !a() && b() && c();});. But I guess the main issue with my approach is that it doesn't work unless your devices are connected in a DAG.

Question 6

The devices are not necessarily connected in a DAG - in fact in the system I'm currently implementing the VIA port A and keyboard matrix have links running in both directions. Also, specifying the function in the constructor is a non-starter - the function represents "wiring" which does not belong in the chip implementations, but in the higher level code that virtually connects the chips together.

G. Sliepen G. Sliepen 68.7k3 gold badges74 silver badges179 bronze badges · Answer 1 · 2021-03-19 18:38:24Z

Overhead of `std::function`

The main problem is that you are using std::function, and this comes with some overhead. In particular, it will allocate storage (using new internally) since your lambdas are capturing variables. Since memory allocations can have side effects (they might even throw exceptions if memory could not be allocated), the compiler cannot optimize them away.

Consider just storing the state of a pin as a `bool`

The state of an output pin is just true or false. Instead of storing a function that calculates the state, consider just storing the current state in a bool. It seems unlikely to me that always ensuring the state of the pin is set correctly is less efficient than having a function that calculates it whenever you need the value. An InputPin should then not store a copy of an OutputPin, but either a reference to an OutputPin, or if you want to still be able to have an expression assigned to an InputPin, have it store the function that calculates the input value. Once you do this, everything simplifies enormously, and the compiler will have no trouble optimizing this code.

Here is an example:

#include <functional>
using OutputPin = bool;
template<typename Function>
class InputPin {
 Function f;
public:
 InputPin(const Function &f): f(f) {}
 operator bool() const {
 return f();
 }
};
int main()
{
 OutputPin a(true), b(true), c(true);
 InputPin i([&]{return !a & b & c;});
 
 return i;
}

I see where you're coming from, but this doesn't allow for the use case in my first example, where the InputPin and OutputPin variables are actually member variables of the C++ classes representing the various chip devices. If I was able to post-creation assign the InputPin function outside of the containing class's implementation that might work, though,
also to elaborate further - the OutputPin variables are part of the public interface of a chip and are read only but the state of the pin itself is part of its private state. Imagine also something like a 6522 where a write of a byte to ORB can change 8 output pins at once.
Hm, you could still make OutputPin a class with an operator() to get its state I guess, and then make it a template like I did for InputPin in the above example. Then the declaration of the InputPin in main() becomes: InputPin i([&]{return !a() && b() && c();});. But I guess the main issue with my approach is that it doesn't work unless your devices are connected in a DAG.
The devices are not necessarily connected in a DAG - in fact in the system I'm currently implementing the VIA port A and keyboard matrix have links running in both directions. Also, specifying the function in the constructor is a non-starter - the function represents "wiring" which does not belong in the chip implementations, but in the higher level code that virtually connects the chips together.

Stack Exchange Network

Abstracting device IO / glue logic in a CPU emulator

1 Answer 1

Overhead of `std::function`

Consider just storing the state of a pin as a `bool`

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Abstracting device IO / glue logic in a CPU emulator

1 Answer 1

Overhead of std::function

Consider just storing the state of a pin as a bool

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions

Overhead of `std::function`

Consider just storing the state of a pin as a `bool`