In my Motorola 6809 CPU emulator (written in C++14) I'm working on an abstraction to represent the wiring between emulated devices, such as the ports on a 6522 VIA, or the interrupt lines connecting a peripheral to the CPU.
The abstraction uses functional composition. Each OutputPin
encapsulates a developer-supplied function that returns the current state of that output. InputPin
objects belonging to a different device have a copy of the OutputPin
to which they are attached. Each InputPin
may only be attached to a single OutputPin
, but an OutputPin
may be attached to multiple InputPin
s.
This abstraction is working, and allows for expressive and powerful constructions like this, which does a "wired-and" of three IRQ lines (where one of them is active-high) and attaches it to the CPU's IRQ input and each time the state of that line is tested the composed functions generate the correct result based on the current state of the three inputs:
cpu.IRQ << (!fdc.DIRQ & via.IRQ & acia.IRQ);
and this, which generates an output signal that is the parity of 8 other outputs.
OutputPin parity = out[0] ^ out[1] ^ out[2] ^ out[3] ^ out[4] ^ out[5] ^ out[6] ^ out[7];
My concern, though, is performance. I had hoped that compilers might be able to elide many of the function calls and collapse them into simpler expressions but initial tests with clang 12.0 on macOS don't appear to show any sign of this.
Is there anything I could be doing to improve the run-time efficiency, or am I hoping for too much from the compiler's optimiser?
Here's the complete implementation as a header with inline functions:
static inline bool default_true() {
return true;
}
class OutputPin {
public:
using Function = std::function<bool()>;
protected:
Function f = default_true;
public:
OutputPin() { }
OutputPin(const Function& f) : f(f) { }
void bind(const Function& _f) {
f = _f;
}
operator bool() const {
return f();
}
OutputPin operator !() const {
return OutputPin([&]() {
return !f();
});
}
};
class InputPin {
protected:
OutputPin input;
public:
void attach(const OutputPin& _input) {
input = _input;
}
operator bool() const {
return input;
}
};
inline void operator<<(InputPin& in, const OutputPin& out)
{
in.attach(out);
}
inline OutputPin operator&(const OutputPin& a, const OutputPin& b)
{
return OutputPin([=]() {
return (bool)a && (bool)b;
});
}
inline OutputPin operator|(const OutputPin& a, const OutputPin& b)
{
return OutputPin([=]() {
return (bool)a || (bool)b;
});
}
inline OutputPin operator^(const OutputPin& a, const OutputPin& b)
{
return OutputPin([=]() {
return (bool)a ^ (bool)b;
});
}
and here's a trivial test case for the above header (but which generates 60kB of assembler output!):
#include "device.h"
int main()
{
OutputPin a, b, c;
InputPin i;
i << (!a & b & c);
return i;
}
1 Answer 1
Overhead of std::function
The main problem is that you are using std::function
, and this comes with some overhead. In particular, it will allocate storage (using new
internally) since your lambdas are capturing variables. Since memory allocations can have side effects (they might even throw exceptions if memory could not be allocated), the compiler cannot optimize them away.
Consider just storing the state of a pin as a bool
The state of an output pin is just true
or false
. Instead of storing a function that calculates the state, consider just storing the current state in a bool
. It seems unlikely to me that always ensuring the state of the pin is set correctly is less efficient than having a function that calculates it whenever you need the value. An InputPin
should then not store a copy of an OutputPin
, but either a reference to an OutputPin
, or if you want to still be able to have an expression assigned to an InputPin
, have it store the function that calculates the input value.
Once you do this, everything simplifies enormously, and the compiler will have no trouble optimizing this code.
Here is an example:
#include <functional>
using OutputPin = bool;
template<typename Function>
class InputPin {
Function f;
public:
InputPin(const Function &f): f(f) {}
operator bool() const {
return f();
}
};
int main()
{
OutputPin a(true), b(true), c(true);
InputPin i([&]{return !a & b & c;});
return i;
}
-
\$\begingroup\$ I see where you're coming from, but this doesn't allow for the use case in my first example, where the
InputPin
andOutputPin
variables are actually member variables of the C++ classes representing the various chip devices. If I was able to post-creation assign theInputPin
function outside of the containing class's implementation that might work, though, \$\endgroup\$Alnitak– Alnitak2021年03月19日 22:06:06 +00:00Commented Mar 19, 2021 at 22:06 -
\$\begingroup\$ also to elaborate further - the
OutputPin
variables are part of the public interface of a chip and are read only but the state of the pin itself is part of its private state. Imagine also something like a 6522 where a write of a byte toORB
can change 8 output pins at once. \$\endgroup\$Alnitak– Alnitak2021年03月19日 22:12:53 +00:00Commented Mar 19, 2021 at 22:12 -
\$\begingroup\$ Hm, you could still make
OutputPin
aclass
with anoperator()
to get its state I guess, and then make it atemplate
like I did forInputPin
in the above example. Then the declaration of theInputPin
inmain()
becomes:InputPin i([&]{return !a() && b() && c();});
. But I guess the main issue with my approach is that it doesn't work unless your devices are connected in a DAG. \$\endgroup\$G. Sliepen– G. Sliepen2021年03月20日 09:47:01 +00:00Commented Mar 20, 2021 at 9:47 -
\$\begingroup\$ The devices are not necessarily connected in a DAG - in fact in the system I'm currently implementing the VIA port A and keyboard matrix have links running in both directions. Also, specifying the function in the constructor is a non-starter - the function represents "wiring" which does not belong in the chip implementations, but in the higher level code that virtually connects the chips together. \$\endgroup\$Alnitak– Alnitak2021年03月20日 19:56:41 +00:00Commented Mar 20, 2021 at 19:56
Explore related questions
See similar questions with these tags.