I have implemented Binary_Search in python (I think), and my own method for range
objects. I want to know if there is any way to improve this code, here it is:
def findAmountString(string, char):
amount = 0
for n in range(len(string)):
if string[n] == char: amount += 1
return amount
def findrange(rangeToFind):
constList = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
rangeNum = 1
rangeToFind = list(str(rangeToFind))
del rangeToFind[0:6]
del rangeToFind[-1]
start = ""
stop = ""
step = ""
for n in range(len(rangeToFind)):
if rangeToFind[n] in constList:
if rangeNum == 1:
start += rangeToFind[n]
elif rangeNum == 2:
stop += rangeToFind[n]
else:
step += rangeToFind[n]
if rangeToFind[n] == ",":
rangeNum += 1
if step == "":
step = "1"
return [int(start), int(stop), int(step)]
def search(newData, value):
if isinstance(newData, range):
#This was my own method for 'range'
rangeData = findrange(newData)
if rangeData[0] > value or rangeData[1]-1 < value or value % rangeData[2] != 0:
return False
return True
else:
#This is called the 'binary_search' method (I think)
newData = set(newData)
while len(newData) != 1:
if newData[int(len(newData)/2)] > value:
newData = newData[:int(len(newData)/2)]
else:
newData = newData[int(len(newData)/2):]
return newData[0] == value
Is there any way to improve this code?
Side Note: I purposely didn't put Binary_Search in the Title because I didn't know if the code I coded was using Binary_Search.
2 Answers 2
- Follow PEP8 guidelines for naming functions and variables
- Avoid multiple statements on one line:
if string[n] == char: amount += 1
- Documentation. Your code is rather hard to read, as more or less nothing is documented. What is the prupose of
search
andfindRange
? What do these magic numbers mean:del range_to_find[0:6]
,del range_to_find[-1]
? You should provide type annotations, docstrings and maybe the occasional comment to explain some functionality in more detail.
findAmountString
-> find_amount_string
Three steps to improve this function:
- Never iterate over indices in Python:
def find_amount_string(string, char):
amount = 0
for c in string:
if c == char:
amount += 1
return amount
- List comprehension / Generator expression / Functional approach
def find_amount_string(string, char):
return sum(1 for c in string if c == char)
- Using built-ins
def find_amount_string(string, char):
return string.count(char)
As you can see, we're now only wrapping built-in count
, so I'd say we don't need that function anymore. Also, this function isn't used at all in the code provided, so it might not have been needed in the first place.
findRange
-> find_range
As far as I can tell this function works, but it's really hard to understand why at first glance. It's also not clear why you would want to manually reconstruct start, stop, step
from a given range
object instead of just getting the range.start, range.stop, range.step
attributes from the object. It's also not a reinventing-the-wheel situation, as you're just deconstructing an exsting object and putting it back together with less functionality in a very round-about way. I would say nothing about this function is a good idea.
search
This function looks like its only purpose is to replace Python's in
operator. So if this is not a programming exercise, you should probably replace all calls like search(new_data, value)
by value in new_data
. But let's treat it as a programming exercise, as that's probably the purpose of your implementation:
Functionality for range
objects
Instead of hardcoding indices you should use tuple unpacking:
start, stop, step = find_range(new_data)
if start > value or stop - 1 < value or value % step != 0:
return False
return True
However, your implementation will provide incorrect results if range.start % range.step != 0
. Consider this simple example:
print(list(range(3, 10, 2))) # [3, 5, 7, 9]
print(search(range(3, 10, 2), 5)) # False
You need to replace value % step != 0
by either value % step != start % step
or (value - start) % step != 0
.
This can also be simplified to
start, stop, step = find_range(new_data)
return start <= value < stop and (value - start) % step == 0
As I said, find_range
provides no additional benefit here, so this will also work:
return new_data.start <= value < new_data.stop and (value - new_data.start) % new_data.step == 0
This is probably similiar to Python's own implementation of value in range
.
Functionality for everything else
This function does not work for anything but range
objects, your else
clause will fail since 'set' objects are not subscriptable
.
The approach used is binary search, but I haven't checked its correctness for all cases. Please be aware that even a correct binary search implementation will only work for sorted data.
Here is some further reading on recursive and iterative binary search implementations in Python if you're interested. As stochastic13 correctly points out, the current approach unnecessarily harms runtime.
Nice attempt. @riskypenguin's answer is great, I am just adding this to cover some extra points like the optimization (time-complexity) part.
Optimization
I assume you are familiar with the big-O notation. Mention in the comments if not, will try to add a small explanatory addendum.
range
objects: It is commendable that you usestr
to get the name of the object so that you don't have to convert it to a list which might be O(n). However, you can do it much simply.- The python
range
object has a very efficient implementation. You can directly use thein
operation to check membership and it will work in almost O(1) time. Therefore, for the first part of your code, simply returningvalue in newData
will work - Alternatively, you can also get the
start, stop, step
arguments directly from arange
object by usingnewData.start, newData.stop, newData.step
if you want to continue using your code.
- The python
lists
(I assume you will always get sorted lists in this code, otherwise you cannot use binary search)- The main point of binary search is to not have to go over all the arguments (O(n)) to find whether the element is present or not. But, your code keeps slicing the list and assigning it to the name
newData
. This creates a separate list every time you slice and assign (see the time complexity of the list slicing operation here). Hence you get a much worse time complexity than the expected O(log n) for binary search. - In order to solve this, you should keep track of the first and last elements of your current
newData
only, since they are the minimum and the maximum elements in the list. You can additionally keep track of the number of elements in thenewData
(halved every time) to keep using yourwhile
condition to finish searching. (see bottom)
- The main point of binary search is to not have to go over all the arguments (O(n)) to find whether the element is present or not. But, your code keeps slicing the list and assigning it to the name
Shortening
- Assuming that you wanted to use the long-version of the
range
searching code, you can significantly shorten it. You can split the string to quickly get thestart, step, stop
. That'll save you all the complexrangeNum
logic. - Moreover, instead of using
del
you can just index the string to get the substring which is a bit cleaner (doesn't matter much in efficiency)
def findrange(rangeToFind):
rangeToFind = str(rangeToFind)
rangeToFind = rangeToFind[6:len(rangeToFind) - 1]
step = 1
cutRange = rangeToFind.split(", ")
if len(cutRange) == 2:
start, stop = cutRange
else:
start, stop, step = cutRange
return [int(start), int(stop), int(step)]
External Library
Also, I assume at least some reason of coding this part is to learn/code-without-using the built-in libraries. But just in case it comes in handy later, you can look at the bisect
library in python. There are several others for specific binary searching too, but for vanilla lists, bisect
will do.
Suggested modification
This might not be the cleanest version. Keeping it here as an example of what I meant by a slicing-free implementation
def searchlist(data, value):
max_val = len(data) - 1
min_val = 0
found = False
while max_val - min_val > 1:
if data[(max_val + min_val) // 2] < value:
min_val = (max_val + min_val) // 2 + 1
else:
max_val = (max_val + min_val) // 2
if data[max_val] == value or data[min_val] == value:
return True
return False
-
\$\begingroup\$ Overall agreed. The first line of
findrange
needs to be changed torangeToFind = str(rangeToFind)
, assplit
will fail for lists. Alsostart = ""
andstop = ""
initializations are redundant. \$\endgroup\$riskypenguin– riskypenguin2021年06月03日 12:59:10 +00:00Commented Jun 3, 2021 at 12:59 -
\$\begingroup\$ @riskypenguin Yes, my bad. Altering both \$\endgroup\$stochastic13– stochastic132021年06月03日 12:59:56 +00:00Commented Jun 3, 2021 at 12:59
-
1\$\begingroup\$ Since you're showing alternate versions that don't use
step = "1"
/int(step)
, perhaps comment on how crazy it is to use strings when you could be using integers? (With0
instead of the empty string.) Even though we don't know what the real purpose of the original code was, or why it's written that way, I'd be surprised if concatenating decimal digits(?) was optimal, if that's what it's even doing. \$\endgroup\$Peter Cordes– Peter Cordes2021年06月04日 02:58:47 +00:00Commented Jun 4, 2021 at 2:58
Explore related questions
See similar questions with these tags.
findAmountString
function you have posted? \$\endgroup\$bisect
module? \$\endgroup\$