I wrote this function to check if a given string is in alphabetic order. How can I improve?
def alphacheck(x):
'''This function checks if a given string is in Alphabetical order'''
# Creating a variable to allow me to efficiently index a string of any length
i = len(x) - 1
# Using a recursive formula and checking for the base case when the string is of less than one character
if len(x) <= 1:
return True
# This the recursive portion of the formula
else:
# This code checks if the last two characters are in alphabetical order
# By utilizing the property that the letters of the alphabet increase in "value" accroding to python
if x[i - 1] <= x[i]:
# I remove the last letter of the string to reduce the probelm size
x = x[:i]
# I then repeat this on the smaller string until it reaches the base case or
# Finds out the string is not in alphabetical order and returns False
return alphacheck(x)
else:
return False
5 Answers 5
I am not 100% sure of this, but your code will probably return unexpected results if you use it with accented characters. accènt will probably return false because è is later in the extended ASCII table than n, but humans will view it as alphabetical because they view è the same as e.
There are libraries that can help you with this. https://github.com/kmike/text-unidecode has a more permissive license. https://pypi.python.org/pypi/Unidecode gives slightly better results and performance, but has a GPL license which is harder to use.
-
\$\begingroup\$ Collation depends on locale ( language + location) and even then conventions can vary greatly by context and be undefined. There are few true standards that are universally used and mixed language text will have issues. \$\endgroup\$uchuugaka– uchuugaka2016年11月15日 17:20:25 +00:00Commented Nov 15, 2016 at 17:20
Sorting is idempotent, sorting an already sorted list leaves it unchanged.
So checking if a list is ordered equals checking if it equals itself when sorted:
def is_sorted_alphabetically(string):
return string == ''.join(sorted(string))
If you do not want to have a \$O(NlogN)\$ runtime of sorting or you want to implement this yourself without using the sorted
built-in, I suggest you avoid recursion and use the all
and pairwise
helper functions:
def is_sorted(iterable):
return all(x1 <= x2 for x1, x2 in pairwise(iterable))
This is general and works for any iterable such as string but also lists of numbers or lists of anything of comparable (the type is Iterable[Comparable] -> bool
)
This would work assuming the input is only lowercase or only uppercase.
alphacheck('abc')
would return True, while
alphacheck('aBc')
would return False (because B < a < c
)
You can solve this by converting the input string to lower or upper, like this:
x = x.lower()
It's good that you're trying to make the code readable, but IMO you're using too much comments and not expressive enough naming. It's often too easy to write comments that are just plain wrong, e.g. like here
# Using a recursive formula and checking for the base case when the string is of less than one character
if len(x) <= 1:
you're checking if x's length is less than or equal to 1, but in the comment you're saying that the check is for length < 1. It's a pretty unimportant mistake here, but it's just one of the many ways that excessive commenting can be bad. I would like less comments, and more expressive names -- something like this:
def is_alphabetical_ordered(input_string):
'''This function checks if a given string is in Alphabetical order'''
input_string = input_string.lower()
last_index = len(input_string) - 1
if len(input_string) <= 1:
return True
elif input_string[last_index - 1] <= input_string[last_index]:
input_string = input_string[:last_index]
return is_alphabetical_ordered(input_string)
else:
return False
More on readability
For a really concise and insightful explanation of the purpose of comments, read rolfl's answer to this codereview question.
I also highly recommend reading Robert Martin's Clean Code, as it goes more deeply into readability (among other things) — beyond comments and naming.
-
2\$\begingroup\$ +1 for mentioning the too many comments. Using this kind of comments, while well-intentioned, is a cop-out from writing readable code instead. \$\endgroup\$Konrad Rudolph– Konrad Rudolph2016年09月19日 12:32:29 +00:00Commented Sep 19, 2016 at 12:32
In general, with Python, after first understanding the problem, it is often useful to attempt a pythonic implementation. There is a certain pattern to pythonic solutions, and while I don't know of any strict definition, a pythonic solution will often be concise and make full use of Python's built-in facilities. Also, it will often perform better and be more legible than a naïve implementation.
def is_alphabetical(string):
'''Checks if a given string is in alphabetical order
is_alphabetical() is case-insensitive, and will treat all strings
as lowercase.
Args:
string: a string of characters.
Returns:
True if the string is alphabetical.
'''
string = string.lower()
return not any((m >= n for m, n in zip(string, string[1:])))
The function name
Your function is called a boolean function (a function that returns whether True
or False
). Hence alphacheck
isn't a good name, I suggest re-naming it as is_abecedarian
, which sounds like a yes/no question (which suits a boolean function).
Variables names
Your variables names aren't very descriptive such as x
and i
. It's better to re-name x
to string
, and i
to index
.
Documentation
You can re-write your function docstring a better way as:
"""Returns whether a string appears in an alphabetical order. """
it's conventional to use triple double quotes ("""
) instead of triple single quotes ('''
) as observed by me (not sure if it's the best way).
Here's the code after all of this:
def is_abecedarian(string):
"""Returns whether a string appears in an alphabetical order. """
index = len(string) - 1
string = string.lower()
if len(string) <= 1:
return True
else:
if string[index-1] <= string[index]:
return is_abecedarian(string[:index])
return False
Notes:
I've made
string
lowercase according to the first answer (a good one), I didn't catch this, and eliminated comments according to it.It's more concise and readable to remove the nested
else
in the second branch, because ifstring[index - 1] <= string[index]
isn'tTrue
then the interpreter will execute (automatically) the next statement(s), and that we need, and here there's no use forelse
.return is_abecedarian(string[:index])
is better, as you don't need to reassignstring
the new value or define a new variable, just pass the new value (string[:index]
) to the function without using a variable, as variables are evaluated first to find values assigned to them, so it's the same to pass the value directly.
Edit:
None
isn't a boolean value, as I've mentioned. It belongs to NoneType class, not to bool class. Sorry for this misconception.
-
\$\begingroup\$ This fails with the input
is_abecedarian('a' * 10000)
, also this has \$O(n^2)\$ memory usage. Two things that are very simple to fix. \$\endgroup\$2016年09月19日 09:30:54 +00:00Commented Sep 19, 2016 at 9:30 -
\$\begingroup\$ Thank you! According to what I know, it fails because it exceeds Python maximum recursion depth, I just focused on making the code more concise and readable, but leaving the code as it is. I've wrote it, before without using recursion. I'm hobbyist and beginner, and I haven't come across big O notation yet, so I don't understand your second note . Feel free to edit the answer! \$\endgroup\$Mahmood Muhammad Nageeb– Mahmood Muhammad Nageeb2016年09月19日 20:37:50 +00:00Commented Sep 19, 2016 at 20:37
-
\$\begingroup\$ The error is due to the recursion, since you state you intentionally keep it don't worry, :) Also it's \$O(n^2)\$ as if you see the line
return is_abecedarian(string[:index])
it'll make a new list n-1 the size of the last. So if you have a string 5 long it'll allocate 5 + 4 + 3 + 2 + 1 memory addresses for this algorithm, andsum(range(x+1))
isx(x+1)/2
which is roughlyx^2
so \$O(n^2)\$. So just ignore my comment, :) \$\endgroup\$2016年09月20日 00:10:38 +00:00Commented Sep 20, 2016 at 0:10