Search Method In Python / Python 3.9

Question 1

I have implemented Binary_Search in python (I think), and my own method for range objects. I want to know if there is any way to improve this code, here it is:

def findAmountString(string, char):
 amount = 0
 for n in range(len(string)):
 if string[n] == char: amount += 1
 return amount
def findrange(rangeToFind):
 constList = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
 rangeNum = 1
 rangeToFind = list(str(rangeToFind))
 del rangeToFind[0:6]
 del rangeToFind[-1]
 start = ""
 stop = ""
 step = ""
 for n in range(len(rangeToFind)):
 if rangeToFind[n] in constList:
 if rangeNum == 1:
 start += rangeToFind[n]
 elif rangeNum == 2:
 stop += rangeToFind[n]
 else:
 step += rangeToFind[n]
 if rangeToFind[n] == ",":
 rangeNum += 1
 if step == "":
 step = "1"
 return [int(start), int(stop), int(step)]
def search(newData, value):
 if isinstance(newData, range):
 #This was my own method for 'range'
 rangeData = findrange(newData)
 if rangeData[0] > value or rangeData[1]-1 < value or value % rangeData[2] != 0:
 return False
 return True
 else:
 #This is called the 'binary_search' method (I think)
 newData = set(newData)
 while len(newData) != 1:
 if newData[int(len(newData)/2)] > value:
 newData = newData[:int(len(newData)/2)]
 else:
 newData = newData[int(len(newData)/2):]
 return newData[0] == value

Is there any way to improve this code?

Side Note: I purposely didn't put Binary_Search in the Title because I didn't know if the code I coded was using Binary_Search.

Question 2

Is there any use of the findAmountString function you have posted?

Question 3

Have you read about the bisect module?

Question 4

Follow PEP8 guidelines for naming functions and variables
Avoid multiple statements on one line: if string[n] == char: amount += 1
Documentation. Your code is rather hard to read, as more or less nothing is documented. What is the prupose of search and findRange? What do these magic numbers mean: del range_to_find[0:6], del range_to_find[-1]? You should provide type annotations, docstrings and maybe the occasional comment to explain some functionality in more detail.

findAmountString -> find_amount_string

Three steps to improve this function:

Never iterate over indices in Python:

def find_amount_string(string, char):
 amount = 0
 
 for c in string:
 if c == char:
 amount += 1
 
 return amount

List comprehension / Generator expression / Functional approach

def find_amount_string(string, char):
 return sum(1 for c in string if c == char)

Using built-ins

def find_amount_string(string, char):
 return string.count(char)

As you can see, we're now only wrapping built-in count, so I'd say we don't need that function anymore. Also, this function isn't used at all in the code provided, so it might not have been needed in the first place.

findRange -> find_range

As far as I can tell this function works, but it's really hard to understand why at first glance. It's also not clear why you would want to manually reconstruct start, stop, step from a given range object instead of just getting the range.start, range.stop, range.step attributes from the object. It's also not a reinventing-the-wheel situation, as you're just deconstructing an exsting object and putting it back together with less functionality in a very round-about way. I would say nothing about this function is a good idea.

search

This function looks like its only purpose is to replace Python's in operator. So if this is not a programming exercise, you should probably replace all calls like search(new_data, value) by value in new_data. But let's treat it as a programming exercise, as that's probably the purpose of your implementation:

Functionality for range objects

Instead of hardcoding indices you should use tuple unpacking:

start, stop, step = find_range(new_data)
if start > value or stop - 1 < value or value % step != 0:
 return False
return True

However, your implementation will provide incorrect results if range.start % range.step != 0. Consider this simple example:

print(list(range(3, 10, 2))) # [3, 5, 7, 9]
print(search(range(3, 10, 2), 5)) # False

You need to replace value % step != 0 by either value % step != start % step or (value - start) % step != 0.

This can also be simplified to

start, stop, step = find_range(new_data)
return start <= value < stop and (value - start) % step == 0

As I said, find_range provides no additional benefit here, so this will also work:

return new_data.start <= value < new_data.stop and (value - new_data.start) % new_data.step == 0

This is probably similiar to Python's own implementation of value in range.

Functionality for everything else

This function does not work for anything but range objects, your else clause will fail since 'set' objects are not subscriptable.

The approach used is binary search, but I haven't checked its correctness for all cases. Please be aware that even a correct binary search implementation will only work for sorted data.

Here is some further reading on recursive and iterative binary search implementations in Python if you're interested. As stochastic13 correctly points out, the current approach unnecessarily harms runtime.

Question 5

Nice attempt. @riskypenguin's answer is great, I am just adding this to cover some extra points like the optimization (time-complexity) part.

Optimization

I assume you are familiar with the big-O notation. Mention in the comments if not, will try to add a small explanatory addendum.

range objects: It is commendable that you use str to get the name of the object so that you don't have to convert it to a list which might be O(n). However, you can do it much simply.
- The python range object has a very efficient implementation. You can directly use the in operation to check membership and it will work in almost O(1) time. Therefore, for the first part of your code, simply returning value in newData will work
- Alternatively, you can also get the start, stop, step arguments directly from a range object by using newData.start, newData.stop, newData.step if you want to continue using your code.
lists (I assume you will always get sorted lists in this code, otherwise you cannot use binary search)
- The main point of binary search is to not have to go over all the arguments (O(n)) to find whether the element is present or not. But, your code keeps slicing the list and assigning it to the name newData. This creates a separate list every time you slice and assign (see the time complexity of the list slicing operation here). Hence you get a much worse time complexity than the expected O(log n) for binary search.
- In order to solve this, you should keep track of the first and last elements of your current newData only, since they are the minimum and the maximum elements in the list. You can additionally keep track of the number of elements in the newData (halved every time) to keep using your while condition to finish searching. (see bottom)

Shortening

Assuming that you wanted to use the long-version of the range searching code, you can significantly shorten it. You can split the string to quickly get the start, step, stop. That'll save you all the complex rangeNum logic.
Moreover, instead of using del you can just index the string to get the substring which is a bit cleaner (doesn't matter much in efficiency)

def findrange(rangeToFind):
 rangeToFind = str(rangeToFind)
 rangeToFind = rangeToFind[6:len(rangeToFind) - 1]
 step = 1
 cutRange = rangeToFind.split(", ")
 if len(cutRange) == 2:
 start, stop = cutRange
 else:
 start, stop, step = cutRange
 return [int(start), int(stop), int(step)]

External Library

Also, I assume at least some reason of coding this part is to learn/code-without-using the built-in libraries. But just in case it comes in handy later, you can look at the bisect library in python. There are several others for specific binary searching too, but for vanilla lists, bisect will do.

Suggested modification

This might not be the cleanest version. Keeping it here as an example of what I meant by a slicing-free implementation

def searchlist(data, value):
 max_val = len(data) - 1
 min_val = 0
 found = False
 while max_val - min_val > 1:
 if data[(max_val + min_val) // 2] < value:
 min_val = (max_val + min_val) // 2 + 1
 else:
 max_val = (max_val + min_val) // 2
 if data[max_val] == value or data[min_val] == value:
 return True
 return False

Question 6

Overall agreed. The first line of findrange needs to be changed to rangeToFind = str(rangeToFind), as split will fail for lists. Also start = "" and stop = "" initializations are redundant.

Question 7

@riskypenguin Yes, my bad. Altering both

Question 8

Since you're showing alternate versions that don't use step = "1" / int(step), perhaps comment on how crazy it is to use strings when you could be using integers? (With 0 instead of the empty string.) Even though we don't know what the real purpose of the original code was, or why it's written that way, I'd be surprised if concatenating decimal digits(?) was optimal, if that's what it's even doing.

riskypenguin riskypenguin 3,4831 gold badge10 silver badges28 bronze badges · Accepted Answer · 2021-06-03 12:46:06Z

Follow PEP8 guidelines for naming functions and variables
Avoid multiple statements on one line: if string[n] == char: amount += 1
Documentation. Your code is rather hard to read, as more or less nothing is documented. What is the prupose of search and findRange? What do these magic numbers mean: del range_to_find[0:6], del range_to_find[-1]? You should provide type annotations, docstrings and maybe the occasional comment to explain some functionality in more detail.

findAmountString -> find_amount_string

Three steps to improve this function:

Never iterate over indices in Python:

def find_amount_string(string, char):
 amount = 0
 
 for c in string:
 if c == char:
 amount += 1
 
 return amount

List comprehension / Generator expression / Functional approach

def find_amount_string(string, char):
 return sum(1 for c in string if c == char)

Using built-ins

def find_amount_string(string, char):
 return string.count(char)

As you can see, we're now only wrapping built-in count, so I'd say we don't need that function anymore. Also, this function isn't used at all in the code provided, so it might not have been needed in the first place.

findRange -> find_range

As far as I can tell this function works, but it's really hard to understand why at first glance. It's also not clear why you would want to manually reconstruct start, stop, step from a given range object instead of just getting the range.start, range.stop, range.step attributes from the object. It's also not a reinventing-the-wheel situation, as you're just deconstructing an exsting object and putting it back together with less functionality in a very round-about way. I would say nothing about this function is a good idea.

search

This function looks like its only purpose is to replace Python's in operator. So if this is not a programming exercise, you should probably replace all calls like search(new_data, value) by value in new_data. But let's treat it as a programming exercise, as that's probably the purpose of your implementation:

Functionality for range objects

Instead of hardcoding indices you should use tuple unpacking:

start, stop, step = find_range(new_data)
if start > value or stop - 1 < value or value % step != 0:
 return False
return True

However, your implementation will provide incorrect results if range.start % range.step != 0. Consider this simple example:

print(list(range(3, 10, 2))) # [3, 5, 7, 9]
print(search(range(3, 10, 2), 5)) # False

You need to replace value % step != 0 by either value % step != start % step or (value - start) % step != 0.

This can also be simplified to

start, stop, step = find_range(new_data)
return start <= value < stop and (value - start) % step == 0

As I said, find_range provides no additional benefit here, so this will also work:

return new_data.start <= value < new_data.stop and (value - new_data.start) % new_data.step == 0

This is probably similiar to Python's own implementation of value in range.

Functionality for everything else

This function does not work for anything but range objects, your else clause will fail since 'set' objects are not subscriptable.

The approach used is binary search, but I haven't checked its correctness for all cases. Please be aware that even a correct binary search implementation will only work for sorted data.

Here is some further reading on recursive and iterative binary search implementations in Python if you're interested. As stochastic13 correctly points out, the current approach unnecessarily harms runtime.

Stack Exchange Network

Search Method In Python / Python 3.9

2 Answers 2

Optimization

Shortening

External Library

Suggested modification

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Search Method In Python / Python 3.9

2 Answers 2

Optimization

Shortening

External Library

Suggested modification

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions