0

I am trying to get a binary search to work in Python. I have a massive, sorted list of passwords. The plan is to get a password input from the user and see if it is in the list. I've decided to implement a binary search because of the size of the list.

Here's my code:

Found = False
Password = user_input("Enter a password: ")
with io.open('final.txt', encoding='latin-1') as myfile:
 data = myfile.readlines()
 low = 0
 high = (int(len(data))+1)
 while (low < high) and not Found:
 mid = int((low+high)/2)
 if data[mid] == Password:
 Found = True
 break
 elif Password < str(data[mid]):
 high = mid - 1
 elif Password > str(data[mid]):
 low = mid + 1

I am guessing it is because of the string comparison? Any ideas? The binary search never returns true, even if I explicitly search something that I know is in the list.

I used this code to sort the password list.

import io
with io.open('result.txt', encoding='latin-1') as myfile:
 data = myfile.readlines()
def partition(data, start, end):
 pivot = data[end] # Partition around the last value
 bottom = start-1 # Start outside the area to be partitioned
 top = end # Ditto
 done = 0
 while not done: # Until all elements are partitioned...
 while not done: # Until we find an out of place element...
 bottom = bottom+1 # ... move the bottom up.
 if bottom == top: # If we hit the top...
 done = 1 # ... we are done.
 break
 if data[bottom] > pivot: # Is the bottom out of place?
 data[top] = data[bottom] # Then put it at the top...
 break # ... and start searching from the top.
 while not done: # Until we find an out of place element...
 top = top-1 # ... move the top down.
 if top == bottom: # If we hit the bottom...
 done = 1 # ... we are done.
 break
 if data[top] < pivot: # Is the top out of place?
 data[bottom] = data[top] # Then put it at the bottom...
 break # ...and start searching from the bottom.
 data[top] = pivot # Put the pivot in its place.
 return top # Return the split point
def quicksort(data, start, end):
 if start < end: # If there are two or more elements...
 split = partition(data, start, end) # ... partition the sublist...
 quicksort(data, start, split-1)
 quicksort(data, split+1, end)
quicksort(data, 0, (int(len(data))-1))
with io.open('final.txt', 'w', encoding='latin-1') as f:
 for s in data:
 f.write(s)

The sorted list looks something like this: whitespace, then symbols, then numbers, then capital letters (alphabetically sorted), then common letters (alphabetically sorted).

asked Feb 5, 2016 at 13:02
3
  • What is the problem ? Commented Feb 5, 2016 at 13:04
  • The binary search never returns true, even if I explicitly search something that I know is in the list. After any search, printing high or low always returns 992352. Commented Feb 5, 2016 at 13:06
  • beyond your algorithmic problem, two caveats : 1) the execution time is 99% reading the file: so the linear search is the best approach here. 2) If you store passwords in memory, set is better than list : with passwords=set(data), Password in passwords solve your problem in 0(1), when your approach is O( ln(n)) . Commented Feb 5, 2016 at 13:58

6 Answers 6

3

Do not write your own binary search, it's a bit tricky to get them right. Use bisect module instead.

from bisect import bisect_left
def binary_search(lst, el):
 # returns lower bound of key `el` in list `lst`
 index = bisect_left(lst, el)
 # check that: (1) the lower bound is not at the end of the list and
 # (2) the element at the index matches `el`
 return index < len(lst) and lst[index] == el

Usage:

test = ["abc", "def", "ghi"]
print(binary_search(test, "def")) # True
print(binary_search(test, "xyz")) # False
answered Feb 5, 2016 at 13:19
Sign up to request clarification or add additional context in comments.

1 Comment

I am well aware that there's ways to do it without re-inventing the wheel. However, I'm doing it as an exercise in algorithmical thinking and was hoping you could help me debug my code.
0

You probably have a new line character at the end of each password after calling readlines, use rstrip() to remove it

 Found = False
 Password = user_input("Enter a password: ")
 with io.open('final.txt', encoding='latin-1') as myfile:
 data = myfile.readlines()
 low = 0
 high = len(data)-1 #no need to cast to int, should be len()-1
 while (low <= high) and not Found: #less than or equal to
 mid = int((low+high)/2)
 if data[mid].rstrip() == Password: #Remove newline character before compare
 Found = True
 break
 elif Password < str(data[mid]):
 high = mid - 1
 elif Password > str(data[mid]):
 low = mid + 1
answered Feb 5, 2016 at 13:20

Comments

0

If you only want to search the password in your list then In your code

data = myfile.readlines()

you have already taken all the passwords into the memory. so if you just want to check if a given password is present in your list or not, you can directly check by using

if Password in data:
 print "yes it is present in the list"
else:
 print "Not present in the list"

hope it may help.

answered Feb 5, 2016 at 13:20

Comments

0

This is example of binary search

def binarySearch(alist, item):
 first = 0
 last = len(alist)-1
 found = False
 while first<=last and not found:
 midpoint = (first + last)//2
 if alist[midpoint] == item:
 found = True
 else:
 if item < alist[midpoint]:
 last = midpoint-1
 else:
 first = midpoint+1
 return found
mylist1 = [0, 1, 2, 8, 9, 17, 19, 32, 42,]
print(binarySearch(mylist1, 3))
print(binarySearch(mylist1, 13))
mylist2 = [0, 1, 2, 8, 9, 17, 19, 32, 42, 99]
print(binarySearch(mylist2, 2))
print(binarySearch(mylist2, 42))

I got then

False
False
True
True

Yes and I am sure that you need new line character at the end of each password after calling readlines,as Eamon pointed out.

answered Feb 5, 2016 at 13:24

Comments

0

There are two problems .

  1. Your binary search algorithm is wrong .

The repeat condition should be

while (low <= high)

or your can't find the first and the last element .

  1. readlines() will read \n but user_input() does not .

Which causes `Password` == `Password\n' be false forever.

answered Feb 5, 2016 at 13:26

Comments

0

You're skipping parts of your list because of the way you're setting low and high. Because of this, low == high occurs after updating and before checking, causing you to jump out of the loop prematurely.

There are two easy solutions:

Either..

  • set high = mid or low = mid instead of mid -/+ 1, triggering an extra iteration,

or..

  • Check if high == low and data[low] == Password after the loop terminates, as you might still find Password there.
answered Feb 5, 2016 at 13:34

1 Comment

Ah, as @KIDJourney mentioned, changing your loop condition also fixes it.

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.