I have written a Python function called comp
that checks whether two given arrays have the same elements, with the same multiplicities. The multiplicity of a member is the number of times it appears in the array. I would like to post my code here for review, so that I can optimize it and receive any advice on how to improve it.
Here's the code for the function:
def comp(array1, array2):
c = 0
ta = []
if len(array1) == 0 or len(array2) == 0:
return None
for i in range(len(array2)):
for j in range(len(array2)):
if array2[i] != array1[j] * array1[j]:
c += 1
ta.append(c)
c = 0
if len(array1) in ta:
return False
else:
return True
I would appreciate any suggestions or advice on how to improve the code's efficiency, readability, or any other best practices that I might be missing.
3 Answers 3
for i in range(len(array2)):
for j in range(len(array2)):
if array2[i] != array1[j] * array1[j]:
c += 1
The multiplicity of a member is the number of times it appears in the array.
Either the problem is ill-specified, or the code is wrong.
When the lengths of array{1,2} differ, we either ignore many entries or we raise IndexError.
It is very surprising for a bool
predicate
to return None
.
Consider using mypy
to help you lint this code.
The identifier c
apparently means "count".
Take the opportunity to spell that out more fully.
What were you thinking when you chose the identifier ta
??
What should I think when I read it?
No idea what's behind that decision.
Rename it before merging down to main
.
You chose a list
datastructure for something that
we only do an in
test on.
Ordinarily I would encourage switching it to a set
,
for O(1) constant lookup speed.
But given the crazy quadratic complexity of that loop,
it's lost in the noise.
Instead you should focus on choosing a sensible algorithm
with complexity O(N log N) or better.
Given the lack of unit tests and poor level of specification / documentation, I would not be willing to delegate or accept maintenance tasks on this code base.
There's no docstring for the function (and it's so generically named, it could easily be mistaken for something else).
There are no unit-tests; they should be trivial to add.
Variable names such as c
and ta
might mean something to you now, but it's unlikely they will mean much to the next maintainer (perhaps 6-months-older you?).
It's surprising that a predicate function can return None
. If one of the inputs is empty, I'd expect the result to be true if, and only if, the other input is also empty.
It's not clear why we're multiplying array1[j]
by itself when comparing. That's going to break the comparison, and should fail our unit tests.
The O(n2) algorithm that searches for matches scales poorly. We can make it much more efficient (at the expense of increased memory usage) by constructing a multiset (Counter
) from each input, then comparing the two multisets for equality.
Simpler implementation
#!/usr/bin/python3
import collections
def bag_equal(array1, array2):
"""
True if the inputs contain the same elements (including duplicates)
in any order.
>>> bag_equal([], [])
True
>>> bag_equal([], [''])
False
>>> bag_equal([0, 0, 2, 3, 3], (0, 2, 3, 0, 3))
True
>>> bag_equal([0, 0, 1], [0, 1, 1])
False
>>> bag_equal(["me", "you"], ["me", "you", "you"])
False
"""
# Test lengths first, to avoid creating counters unnecessarily
return (len(array1) == len(array2) and
collections.Counter(array1) == collections.Counter(array2))
if __name__ == '__main__':
import doctest
doctest.testmod()
Things to check upfront
- same length, else multiplicities cannot all be the same
tools to use
- collections.Counter
most simple:
return Counter(items1) = Counter(items2)
one additional Counter instead of two:delta = Counter(items1) delta.subtract(items2) return not any(delta.values())
sorted
- useful if the items cannot be hashed (meaningCounter
won't work)
return sorted(items1) == sorted(items2)
-
\$\begingroup\$ Inviting everybody to expand or otherwise improve. \$\endgroup\$greybeard– greybeard2023年05月10日 07:54:52 +00:00Commented May 10, 2023 at 7:54
-
1\$\begingroup\$
Counter.subtract()
returnsNone
, so you can't chain the.values()
.delta = Counter(items1)\ndelta.subtract(items2)\nreturn not any(delta.values())
should work. \$\endgroup\$RootTwo– RootTwo2023年05月10日 17:17:24 +00:00Commented May 10, 2023 at 17:17
comp([1,2], [1,2])
should return True, but it returns False. \$\endgroup\$array1
? This code makes no sense to me, and no comments explain why it's doing it that way. It's highly non-obvious how this algorithm works, although I guess the idea of looking forlen(array1)
inta[]
is that it's counting non-matches, so that would find a case where nothing matched? Seems weird. \$\endgroup\$