4
\$\begingroup\$

I have written a Python function called comp that checks whether two given arrays have the same elements, with the same multiplicities. The multiplicity of a member is the number of times it appears in the array. I would like to post my code here for review, so that I can optimize it and receive any advice on how to improve it.

Here's the code for the function:

def comp(array1, array2):
 c = 0
 ta = []
 if len(array1) == 0 or len(array2) == 0:
 return None
 for i in range(len(array2)):
 for j in range(len(array2)):
 if array2[i] != array1[j] * array1[j]:
 c += 1
 ta.append(c)
 c = 0
 if len(array1) in ta:
 return False
 else:
 return True

I would appreciate any suggestions or advice on how to improve the code's efficiency, readability, or any other best practices that I might be missing.

Toby Speight
87.1k14 gold badges104 silver badges322 bronze badges
asked May 9, 2023 at 23:01
\$\endgroup\$
4
  • 2
    \$\begingroup\$ Could you provide some sample input? \$\endgroup\$ Commented May 9, 2023 at 23:51
  • 1
    \$\begingroup\$ This code doesn't appear to work correctly. Based on the problem description, comp([1,2], [1,2]) should return True, but it returns False. \$\endgroup\$ Commented May 10, 2023 at 6:16
  • 2
    \$\begingroup\$ What resource do you want to "optimize"? Memory, CPU, something else? There are tags for these, but I don't know which to apply. (BTW, I removed the "functional programming" tag, as this is clearly using an imperative style). \$\endgroup\$ Commented May 10, 2023 at 7:10
  • \$\begingroup\$ Why is this code squaring elements of array1? This code makes no sense to me, and no comments explain why it's doing it that way. It's highly non-obvious how this algorithm works, although I guess the idea of looking for len(array1) in ta[] is that it's counting non-matches, so that would find a case where nothing matched? Seems weird. \$\endgroup\$ Commented May 10, 2023 at 17:27

3 Answers 3

2
\$\begingroup\$
 for i in range(len(array2)):
 for j in range(len(array2)):
 if array2[i] != array1[j] * array1[j]:
 c += 1

The multiplicity of a member is the number of times it appears in the array.

Either the problem is ill-specified, or the code is wrong.

When the lengths of array{1,2} differ, we either ignore many entries or we raise IndexError.

It is very surprising for a bool predicate to return None. Consider using mypy to help you lint this code.


The identifier c apparently means "count". Take the opportunity to spell that out more fully.

What were you thinking when you chose the identifier ta?? What should I think when I read it? No idea what's behind that decision. Rename it before merging down to main.

You chose a list datastructure for something that we only do an in test on. Ordinarily I would encourage switching it to a set, for O(1) constant lookup speed. But given the crazy quadratic complexity of that loop, it's lost in the noise. Instead you should focus on choosing a sensible algorithm with complexity O(N log N) or better.


Given the lack of unit tests and poor level of specification / documentation, I would not be willing to delegate or accept maintenance tasks on this code base.

answered May 10, 2023 at 16:54
\$\endgroup\$
0
6
\$\begingroup\$

There's no docstring for the function (and it's so generically named, it could easily be mistaken for something else).

There are no unit-tests; they should be trivial to add.

Variable names such as c and ta might mean something to you now, but it's unlikely they will mean much to the next maintainer (perhaps 6-months-older you?).

It's surprising that a predicate function can return None. If one of the inputs is empty, I'd expect the result to be true if, and only if, the other input is also empty.

It's not clear why we're multiplying array1[j] by itself when comparing. That's going to break the comparison, and should fail our unit tests.

The O(n2) algorithm that searches for matches scales poorly. We can make it much more efficient (at the expense of increased memory usage) by constructing a multiset (Counter) from each input, then comparing the two multisets for equality.


Simpler implementation

#!/usr/bin/python3
import collections
def bag_equal(array1, array2):
 """
 True if the inputs contain the same elements (including duplicates)
 in any order.
 >>> bag_equal([], [])
 True
 >>> bag_equal([], [''])
 False
 >>> bag_equal([0, 0, 2, 3, 3], (0, 2, 3, 0, 3))
 True
 >>> bag_equal([0, 0, 1], [0, 1, 1])
 False
 >>> bag_equal(["me", "you"], ["me", "you", "you"])
 False
 """
 # Test lengths first, to avoid creating counters unnecessarily
 return (len(array1) == len(array2) and
 collections.Counter(array1) == collections.Counter(array2))
if __name__ == '__main__':
 import doctest
 doctest.testmod()
answered May 10, 2023 at 7:18
\$\endgroup\$
2
\$\begingroup\$

Things to check upfront

  • same length, else multiplicities cannot all be the same

tools to use

  • collections.Counter
    most simple:
    return Counter(items1) = Counter(items2)
    one additional Counter instead of two:
     delta = Counter(items1)
     delta.subtract(items2)
     return not any(delta.values())
    
  • sorted - useful if the items cannot be hashed (meaning Counter won't work)
    return sorted(items1) == sorted(items2)
\$\endgroup\$
2
  • \$\begingroup\$ Inviting everybody to expand or otherwise improve. \$\endgroup\$ Commented May 10, 2023 at 7:54
  • 1
    \$\begingroup\$ Counter.subtract() returns None, so you can't chain the .values(). delta = Counter(items1)\ndelta.subtract(items2)\nreturn not any(delta.values()) should work. \$\endgroup\$ Commented May 10, 2023 at 17:17

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.