Return to Answer

Commonmark migration

Source Link

edited Jun 10, 2020 at 13:24

Community Bot

edited Jun 10, 2020 at 13:24

Community Bot

Old array: avg time: 6.518096446990967

New array: avg time: 6.587247729301453

Old array: avg time: 6.518096446990967

New array: avg time: 6.587247729301453

Old array: avg time: 6.518096446990967

New array: avg time: 6.587247729301453

Source Link

answered Jan 8, 2018 at 23:27

aghast

answered Jan 8, 2018 at 23:27

aghast

12.6k
25
46

The implementation of your Array class still bugs me. So I cloned your repo and renamed Array -> OldArray, then created class NewArray:

class NewArray(object):
 """Mixin to denote objects that are sent from OANDA in an array.
 Also used to correctly serialize objects.
 """
 def __contains__(self, item):
 """Return True if item in this array, False otherwise.
 Note: this traverses all or part of the array, instantiating the
 objects. Using `x in array` may, therefore, have a serious impact on
 performance.
 """
 for value in self:
 if value == item:
 return True
 def __init__(self, *items):
 """Initialize a new array.
 The *items passed in are assumed to be JSON data. If an item is
 accessed, it is passed to `create_attribute` with the appropriate
 class type.
 Initially, objects are stored in self._items. When accessed, the
 objects are reified and stored in self.items. This is transparently
 handled by self.__getitem__(self, key).
 """
 print(f"NewArray<{self._contains}> with len {len(items)}")
 self._items = items
 self.items = []
 def __init_subclass__(cls, **kwargs):
 """Record the type *contained in* the subclass-array.
 A subclass like:
 class array_holding_foo(Array, contains=Foo):
 pass
 will have all its inner objects instantiated using class Foo.
 """
 cls._contains = kwargs.pop('contains')
 def __len__(self):
 return len(self._items)
 def __iter__(self):
 """Iterate over items in array. Use integer indexing so that
 __getitem__ can handle reifying all the objects.
 """
 for index in range(len(self)):
 yield self[index]
 def __getitem__(self, key):
 print(f"getitem[{key}] called on NewArray<{self._contains}>")
 if isinstance(key, slice):
 length = len(self.items)
 start = (0 if key.start is None
 else key.start if key.start >= 0
 else key.start + length)
 stop = (length if key.stop is None
 else key.stop if key.stop >= 0
 else key.stop + length)
 step = (1 if key.step is None
 else key.step)
 # Note: this reifies the items before putting them in the
 # new object.
 return self.__class__(*[self[index]
 for index in range(start, stop, step)])
 length = len(self._items)
 if key < 0:
 key += length
 if not (0 <= key < length):
 raise IndexError('Array index out of range')
 if key >= len(self.items):
 self.items += [None] * (key - len(self.items) + 1)
 if self.items[key] is None:
 json = self._items[key]
 self.items[key] = create_attribute(self._contains, json)
 return self.items[key]

I know that this code does less work when an array is created: it creates and sets two attributes in __init__ but doesn't loop at all. I had an older version that set the .items list to [None] * len(items), which made for less work in the __getitem__ method, but it still "looped" N times, so I tried squeezing that out!

But your benchmark using this code averaged a few hundredths of a second slower than the benchmark using your old Array implementation.

I think that means that the limit of performance has been reached. I ran the client_benchmark script 10 times, sorted the times reported, dropped the max and min, and averaged the 8 remaining (for both OldArray and NewArray versions).

Old array: avg time: 6.518096446990967

New array: avg time: 6.587247729301453

My take-away is that your code- as written when you posted this- is close enough to "doing nothing" in performance that tweaking it just produces noise on this benchmark.

Sooooo.... you need another benchmark! Possibly several benchmarks.

You should save this as the "creating arrays" benchmark, and add it to your perftest directory (which you don't have... yet).

Then maybe create some other benchmarks, reflective of actual use cases, which we can use to hammer out the performance of objects when the arrays are actually accessed, instead of just creating them.

Edit:

Also, if slicing is actually used it probably deserves better treatment. There should be a way of copying the json and actual versions in the initializer.

lang-py