Re: How can I optimize my metatables?
[
Date Prev][
Date Next][
Thread Prev][
Thread Next]
[
Date Index]
[
Thread Index]
- Subject: Re: How can I optimize my metatables?
- From: KHMan <keinhong@...>
- Date: Sat, 2 Dec 2017 10:47:13 +0800
On 12/2/2017 8:44 AM, Meepen wrote:
I'm not sure what method x86 kernel mode uses to update the
virtual to physical lookup in the mmu, but id assume it uses
internal data structure swaps of some sort instead of functions
translating addresses because of speed
Virtual to physical lookup is cached by the TLB [1]. It's just a
kind of on-demand lookup, with cached results. The page tables
themselves would be managed by the OS kernel when setting up
processes etc. During operation: (1) No penalty for untouched
memory pages, nothing untouched needs to be TLB cached. (2) TLB
hit if memory location is in the cache, fast path/lookup. (3) Read
page tables and update cache on a TLB lookup miss, slow path/lookup.
[1] https://en.wikipedia.org/wiki/Translation_lookaside_buffer
TLB specs on a somewhat recent processor architecture (AMD Jaguar)
[2]. It tries to reduce miss latencies via speculation and side
caches. But if we code a single-threaded software app, there are
no direct equivalents for hardware speedups which run in parallel.
[2] https://www.realworldtech.com/jaguar/6/
TLB misses are quite low, chip architects have put much more
effort (and transistor budgets) on branch prediction and
DCache/ICache management. Plus modern OS kernels use large pages,
this reduces contention and minimizes penalty of kernel mode
switches. Also modern OS on modern CPUs do not flush everything on
user mode process switches these days. So most kinds of
optimizations (all the low hanging fruit) have been implemented.
I checked an Intel optimization manual, there is only one entry
for TLB: TLB priming. Make a memory read for an upcoming page in
advance so that the TLB is updated early. This gives the CPU
opportunity to hide a TLB miss. But one may well see gains only
with processing data that have predictable read/write characteristics.
On Dec 1, 2017 6:49 PM, "Soni "They/Them" L." wrote:
On 2017年12月01日 01:45 AM, Meepen wrote:
Could you at all update metatables on coroutine switch?
it'd be faster if you used the metatables at all but would
slow down minimally otherwise
Context switches instead of namespacing?
Hmm... Maybe. It wouldn't be easy, because with context
switches, any mistake leaks the wrong context. With
namespacing, that problem is pretty much non-existent.
I mean, what do modern CPUs and kernels use?
On Nov 30, 2017 6:17 PM, "Soni "They/Them" L."
<fakedme@gmail.com <mailto:fakedme@gmail.com>
<mailto:fakedme@gmail.com <mailto:fakedme@gmail.com>>> wrote:
Hi!
I have some code that looks like this:
debug.setmetatable("", {__index=function(o,k)
local mt = metatables[coroutine.running()].string
if mt then
local __index = rawget(mt, "__index")
if type(__index) == "function" then
return __index(o,k)
else
return __index[k]
end
end
error()
end})
It takes the metatables for the basic Lua types
(string, number,
nil, etc) and replaces them with a proxy metatable.
This proxy
metatable forwards to a different table based on the
currently
running coroutine. This gives me virtualization of
those metatables.
It's really slow (3-4x slower[1] than the default string
metatable) and I'd like to make it faster. Is that
possible?
[1] - I haven't actually benchmarked it, but default
string
metatable gives about 2 table accesses per operation;
this thing
does at least 8 when using globals, and that doesn't
take into
account interpreter overhead and all the function calls!
[snip]
--
Cheers,
Kein-Hong Man (esq.)
Selangor, Malaysia