Re: Speed of # operator (Was: ipairs in Lua 5.3.0-alpha)
[
Date Prev][
Date Next][
Thread Prev][
Thread Next]
[
Date Index]
[
Thread Index]
- Subject: Re: Speed of # operator (Was: ipairs in Lua 5.3.0-alpha)
- From: Anton Titov <anton@...>
- Date: 2014年8月15日 18:19:58 +0300
On 15.08.2014 14:39, Enrico Colombini wrote:
On 15/08/2014 12.23, Anton Titov wrote:
Anyway, bottom line is in the long run, realloc running time is
proportional to min(oldsize, newsize), hence log(n)
reallocations is O(n*log(n)).
Interesting considerations, thanks (sometimes one tends not to think
too much about the time spent in low-level system libraries).
A note: if needed, realloc time could be made O(1) by using a constant
time allocator such as TLSF (http://tlsf.baisoku.org/). It was
suggested to me on this list and I used it with Lua with good results.
Of course that O(1) could still be longer than O(n*log(n)) with the
system libraries, but for a large enough 'n' it could make a difference.
(the main usage of TLSF is in real-time systems where an upper bound
is desirable for allocation calls time)
There is a memcpy in tlsf_realloc, I just saw the code. There is also a
path that is non-memcpy when the memory after current block is free.
I've created easily a test case that realloc from 256 to 512, 1024, 2048
and 4096 bytes returned all different pointers. Of course I had to
malloc enough memory between reallocs that does not fit in previously
freed fragment.
In any case in long running application with some memory fragmentation
you should assume realloc not to be constant.
And to get back on topic to Lua - even if you assume realloc to be
constant, when there is no place for the next element in the array, Lua
does not simply do t->sizearray*=2. Instead it traverses all current
values in array part to see how many of them are nil (i.e. if the array
is sparse) and makes a (quite efficient) decision how to do a good
balance between the array and hash part. That way chances are that if
you have an array with t[1], t[2], t[3], t[4], t[6], t[8], t[1000],
t[1001] set, the chances are you will end up with numbers 1 to 8 in
array and 1000 and 1001 in hash part.
By the way I looked at the code of computesizes now and it choses
maximum possible array size that is a power of two and has at least half
of the keys present. I think that array part is more than twice more
optimal than the hash part and that coefficient may be played with. I
would go with finding minimum of
sizeof(TValue)*power_of_2_array_size+sizeof(Node)*(total_elements-elements_that_would_go_to_the_array_part)*SOME_HASH_LOAD_CONSTANT_LIKE_1_POINT_5.
Given that sizeof(Node) is more than twice the sizeof(TValue) and that
you want to have more hash buckets that you have elements (of course
rounding up to power of two might do that) it may turn out that 3 or
even 4 is a better coefficient. On the other hand optimizing sparse
tables may not worth it.
Anyway that I guess officially makes array fill up O(n*log(n)).
Anton