I solved LeetCode 137 in C++. TLDR of the problem: array of numbers, nums
is given; all numbers appear 3 times, except one which appear only once. Find the number that appears once.
When trying to convert my C++ solution to python, I came across few funny things, i.e. the bit operations seem to behave a bit differently in python, on negative integers.
So, I switched to using formatting as binary string - but, to my surprise, INT_MIN (-2^32 - 1) resulted in a string of length 33, containing the sign as well. Where could I read more about this and understand why it happens?
How can I improve the code below, to make it more pythonic?
def singleNumber(self, nums: List[int]) -> int:
INT_BASE = 33 # because of INT32_MIN
# give python what it likes
counts_nz = [0 for _ in range(INT_BASE)]
vals_bit = ["0" for _ in range(INT_BASE)]
for num in nums:
# 33 because of INT32_MIN takes 33 bits to represent.
for idx, bin_val in enumerate(f"{num:033b}"):
if bin_val != "0": # can be "1" or "-"
counts_nz[idx] += 1
vals_bit[idx] = bin_val
# make the bits binary string -- set to value for M3 + 1 and 0 otherwise
bin_res = "".join(
[
vals_bit[idx] if count % 3 == 1 else "0"
for idx, count in enumerate(counts_nz)
]
)
return int(bin_res, 2)
1 Answer 1
INT_MIN
Yeah, python doesn't really expose a "machine integer" concept. Page 5 of the 1974 MACLISP manual explained
it is impossible to get "overflow" in bignum arithmetic ...
Now if C code assigns int16 n = 32767
and then n++
increments,
we may
get the most negative int rather than 32768.
Which is fine by programmers but makes mathematicians a bit cross.
In python the sign bit is entirely separate.
The bit counting trick is nice, I like it.
Here is an implementation of that which treats the sign bit seperately.
Also, pep-8
asks for a snake_case function name.
And List[int]
works but in modern interpreters
we prefer a lower list[int]
annotation.
import random
import unittest
from hypothesis import given
import hypothesis.strategies as st
def single_number(nums: list[int]) -> int:
res = 0
for bit in range(32):
counts = sum(num < 0 for num in nums)
mask = 1 << bit
for num in nums:
if abs(num) & mask:
counts += 1
if counts % 3 == 1:
res |= mask
BIAS = 2**31
if res >= BIAS: # caller will need to see a negative result
res -= (2 * BIAS - 1)
return res
class TestLeet137(unittest.TestCase):
def test_leet137(self):
self.assertEqual(3, single_number([2, 2, 3, 2]))
self.assertEqual(0, single_number([2, 1, 2, 1, 2, 1, 0]))
self.assertEqual(-99, single_number([0, 1, 0, 1, 0, 1, -99]))
def test_negative_binary(self):
num = -6
self.assertEqual("-00000000000000000000000000000110", f"{num:033b}")
# These constants are specified by https://leetcode.com/problems/single-number-ii
LO = -(2**31)
HI = 2**31 - 1
@given(st.lists(st.integers(min_value=LO, max_value=HI), min_size=1))
def test_leet(randoms: list[int]):
target, *distractors = randoms
arg = [target] + distractors * 3
assert single_number(arg) == target
arg.sort()
assert single_number(arg) == target
random.shuffle(arg)
assert single_number(arg) == target
The loop effectively produces a 32-bit machine word, so we clean up at the end in order to properly return a negative result.
Also in the 2nd test I was highlighting the not-very-binary {"-", "0", "1"} bits you were getting from that string conversion.
This might be a good opportunity to import ctypes or fixedint.
Given that we need to apply a bias anyway, it seems more convenient to operate on strictly non-negative numbers:
def single_number(nums: list[int]) -> int:
BIAS = 2**31
nums = [num + BIAS for num in nums]
assert all(num >= 0 for num in nums)
res = 0
for bit in range(32):
counts = 0
mask = 1 << bit
for num in nums:
if abs(num) & mask:
counts += 1
if counts % 3 == 1:
res |= mask
return res - BIAS
No loop change after the counts
assignment -- we're just
removing negatives up front and then re-adjusting at end.
Well, ok, I suppose that pasted abs()
is superfluous
and should be deleted.
[0] * INT_BASE
and['0'] * INT_BASE
\$\endgroup\$