On 18 August 2015 at 09:17, Daurnimator
<quae@daurnimator.com> wrote:
Thought I'd try something new and post to the mailing list looking for
existing solutions instead of writing one myself :)
I have two areas of interest here.
First I am finding it really helpful to be able to profile at trace granularity. Traces seem to me like the natural unit of code optimization: try to have the right collection of them and make sure each is internally sensible. Here is the patch from Mike that I am using for this:
Second I am in the middle of hacking a low-level interface to the x86 PMU (Performance Monitoring Unit) by using dynasm to access the RDPMC instruction. This makes it possible to track fine-grained CPU performance events over arbitrary bits of Lua code.
I am racing to finish this before our second child is born.. any day now :).
Output looks like this:
selftest: pmu
328 counters found for CPU model GenuineIntel-6-3F
EVENT TOTAL /packet /breath
instructions 133,998,289 5.000 642.000
cycles 665,169,797 24.000 3188.000
ref-cycles 665,169,792 24.000 3188.000
uops_issued.any 106,860,566 4.000 512.000
uops_retired.all 106,843,031 4.000 512.000
br_inst_retired.conditional 26,702,752 1.000 128.000
br_misp_retired.all_branches 411 0.000 0.000
packet 26,700,000 1.000 128.000
breath 208,593 0.008 1.000
selftest ok
The work in progress code is here:
here is the draft API (not committed yet):
I would quite like to add this to LuaJIT directly. The obstacle is that I depend on Cosmin's extended dynasm that can be used from Lua code. I am considering bringing that into our LuaJIT branch perhaps as a submodule to replace the built-in dynasm.
This is me working my way up to our real application from the humble starting point:
Feedback welcome. Apologies if I fall off the internet for an extended period before finishing it :).
Cheers,
-Luke