lua-users home
lua-l archive

Re: OP_HALT is really useful

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On 2018年05月27日 21:22, Sven Olsen wrote:
Interesting. After mucking around in the VM for the purposes of applying your patch, I've started daydreaming about writing some instrumentation hooks of my own.
*snip*
Do you have any words of wisdom for someone just starting down this path? (It sounds like, maybe, implementing some sort of sampling-based instrumentation using OP_HALT-like hooks turns out to be faster than hacking a new switch into the core VM?
A while ago I hacked a stupid-simple profiler into Lua - just constantly
and unconditionally barfs tons of info into fd3 (or /dev/null, if you
don't assign that from the shell (3>somewhere))… It's counting all
calls, instructions (one counter per instruction), allocations
(increases counters for _all_ functions on the call stack), loads
(load,loadstring,require,… incl. dumping the loaded code), … and the
counters get dumped whenever the thing is GCed/freed. All of that
causes it to run roughly 1.5-1.7x as long. (Additionally dumping a full
stack trace every Nth call makes it… 1.7x (every 107th), 2.8x (every
11th), 13x (trace _ALL_ the calls!) slower in total for a very
call-heavy program (~70M calls, otherwise just summing values) with
mostly <=5 stack depth.)
So if you have some idea of what info you need, you can probably afford
to have that unconditionally enabled in a profiling build.
I don't have the time to clean up the changes & turn them into a patch,
but here's a bunch of notes that may be useful:
 * lobject.h/ClosureHeader: nice place for counters (ncalls,nbytes,…)
 (initialize in lfunc.c/luaF_new[CL]closure)
 * ldo.c/luaD_precall: all calls thru here -> ncalls++ / dump stack
 * lapi.c/pushcclosure: Kill the `if (n == 0)` branch to disable light
 C functions / force C closures, so that you have the counter fields
 * lobject.h/Proto: add instruction counters?
 (NULL-init in lfunc.c/luaFnewproto, alloc in lparser.c/close_func
 using luaM_newvector(L, fs->pc, size_t) and in lundump.c/LoadCode
 using luaM_newvector(S->L, n, size_t), then zero-init all counts)
 (change size_t to whatever counter size you're using)
 * lvm.c/vmfetch, lvm.c/donextjump: increase instr.-counter:
 cl->p->prof_icounts[(ci->u.l.savedpc)-(cl->p->code)]++;
 (prof_icounts is whatever you're calling the instr. counter field)
 * lstate.h/global_State: per Lua state, and
 lstate.h/lua_State: per thread within a Lua state
 (init in lstate.c: lua_newstate, preinit_thread (no allocs) or
 f_luaopen, lua_newthread (allocs ok))
 * lmem.c/luaM_realloc_: all allocations go through here
 * do whatever you do BEFORE the realloc call, as it might be moving
 the stuff that you wanted to touch
 * if tracking allocations, blame (nsize-realosize) bytes if that's >0
 * if block == NULL, osize may be != 0 but a type hint (LUA_TFOO),
 may want to count those to see who's slowing down the GC by creating
 lots of objects (tables, strings, …)
 * may also want to walk a few stack levels & track indirect counts,
 just blaming your low-level constructors (Object.new, map, …)
 doesn't tell you what parts of the code are actually causing this
 * when you want to touch the stack, guard:
 if (!G(L)->version || !L->ci) return; /* still building state */
 CallInfo *ci = L->ci;
 if (ci->previous == ci->next) return; /* setting up first func */
 (this *seems* to take care of every wonky stack state?)
 * stack traversal: just walk the ci->previous chain until NULL
 * dumping accumulated info from the lua*_free* functions works well
 if you properly close the state at the end (so no os.exit(foo), but
 os.exit(foo,true) is ok – or patch os.exit)
That's done against 5.3.4, but only used a couple of times so far, so
the above may be incomplete / missing critical things / contain bugs.
(For stack traces, you may want to make lgc.c/freeobj (cases LUA_TLCL,
LUA_TCCL) and lfunc.c/luaF_freeproto report the closure kind (C/Lua) /
closure->cfunc (gco2ccl(o)->f) / closure->proto (gco2lcl(o)->p) /
proto->source (f->source) mapping so your stack traces can simply be a
list of closure pointers, no need to constantly translate those when you
can do that later. Then just keep a counter or timer in the state &
increment/check in ldo.c/luaD_precall whether you should dump a trace…
should be good enough, and fast.)
Have fun!
-- nobody

AltStyle によって変換されたページ (->オリジナル) /