Problem
- Take a string and return all input characters in alphabetical order
- Assume that no numbers or punctuation will be used
- For the same character, uppercase characters should be returned before lowercase characters
- Do not use the
sorted
helper method- Do not use any libraries that are not included with Python 3.6.1
For example, alphabetize(HelLo)
should return eHLlo
Approach
- Create a 26-element list. This list will represent the counts (uppercase and lowercase) of the different alphabetical characters seen via a dictionary
- Iterate through the characters in the input string
- Get the index for each character by subtracting the
int
values of the lowercase character from theint
value ofa
- Get the existing count for that index & case from the counts dictionary
- Increment the count by
1
- Get the index for each character by subtracting the
- Initialize the return string
- Iterate through each element in the list
- For each dictionary / element, get the character, and get the uppercase and lowercase counts
- Add the uppercase characters to the return string
- Add the lowercase characters to the return string
- Return the alphabetized string
Implementation Discussion
I know the 'Pythonic' way to accomplish this is something like
''.join(sorted(myString))
.
However, as somebody trying to learn Python, I wanted to use this exercise to learn more about strings and characters in Python (for example) vs. Googling a one-liner (not that there's anything wrong with that).
Things I'm concerned about:
- Should this be a class? Or is it ok as a standalone method? (As somebody that is more comfortable in Java, things that aren't in classes make me irrationally uneasy.)
- Speaking of which, there's a decent amount of stuff going on in this
method, should I break it up?
- If I decide to break it up, what's the right way of 'showing' that some functions are only going to be used by another function?
- I think Python doesn't really have a concept of
private
vs.public
so do I_(some_method_name)
here?
- What other 'Pythonic' elements am I missing (again, aside from using
the
sorted
method)?
Implementation
def alphabetize(value):
uppercase_key = 'uppercase'
lowercase_key = 'lowercase'
character_counts = []
for i in range(26):
character_counts.append({uppercase_key: 0, lowercase_key: 0}.copy())
for character in value:
index = ord(character.lower()) - ord('a')
counts = character_counts[index]
key = uppercase_key if character.isupper() else lowercase_key
count = counts[key]
counts[key] = count + 1
alphabetized = ''
for index, counts in enumerate(character_counts):
character = chr(ord('a') + index)
uppercase_counts = counts[uppercase_key]
lowercase_counts = counts[lowercase_key]
for i in range(uppercase_counts):
alphabetized += character.upper()
for i in range(lowercase_counts):
alphabetized += character
return alphabetized
3 Answers 3
What other 'Pythonic' elements am I missing (again, aside from using the sorted method)?
If you want to more heavily use the standard library, you can use a collections.Counter
along with the alphabet constants string.ascii_uppercase
and string.ascii_lowercase
.
Things are also simpler if you store the counts in a dict (map) instead of the implicit map-by-ord()
-value that you're currently using. The ordering of the counts doesn't really matter, just the order in which you output them.
from collections import Counter
from itertools import chain
import string
def alphabetize(s):
counts = Counter(s)
return ''.join(
char * counts[char]
for char in chain(*zip(string.ascii_uppercase, string.ascii_lowercase))
)
The counts = Counter(s)
line does essentially this:
counts = collections.defaultdict(int)
for char in s:
counts[char] += 1
where the defaultdict
(a handy class) is a dict that auto-initializes values based on the constructor you pass in, instead of throwing an exception for missing keys.
The chain(*zip(...))
is zipping together the uppercase and lowercase alphabets, then flattening them via chain()
into a single list, ['A', 'a', 'B', ...]
.
Should this be a class?
Nope. This makes sense as a function in Python since it has no external state.
As somebody that is more comfortable in Java, things that aren't in classes make me irrationally uneasy.
Java ties files to classes, but each Python file is already a module, which is a good level of encapsulation. As a very rough rule, avoid classes in Python except when you want to hold shared state, in which case class away. Or, at least, don't reach first for classes.
-
\$\begingroup\$ first of all this was very helpful, so thanks! In case anybody else was unfamiliar with
zip
andchain
, I'm going to walk through what I think is happening (which is fairly simple, but was informative for me). \$\endgroup\$Jae Bradley– Jae Bradley2017年08月23日 06:46:08 +00:00Commented Aug 23, 2017 at 6:46 -
\$\begingroup\$
zip(string.ascii_uppercase, string.ascii_lowercase)
returns aniterable
oftuple
s. Basically,('A', 'a'), ('B', 'b')...('Z', 'z')
. \$\endgroup\$Jae Bradley– Jae Bradley2017年08月23日 06:46:21 +00:00Commented Aug 23, 2017 at 6:46 -
\$\begingroup\$
*
unpacks arguments. This means that*zip(string.ascii_uppercase, string.ascii_lowercase)
results in('A', 'a')
as the first argument in thechain
method('B', 'b')
the second, etc. \$\endgroup\$Jae Bradley– Jae Bradley2017年08月23日 06:46:54 +00:00Commented Aug 23, 2017 at 6:46 -
\$\begingroup\$
chain
basically combines all input iterables into a single iterable in the order the input iterables were provided. So in this case, since the method call looks likechain( ('A', 'a'), ('B', 'b'),...('Z', 'z') )
the output iterable is('A', 'a', 'B', 'b',...'Z', 'z')
as @BenC describes in his answer. \$\endgroup\$Jae Bradley– Jae Bradley2017年08月23日 06:50:49 +00:00Commented Aug 23, 2017 at 6:50 -
3\$\begingroup\$ I would have used
chain.from_iterable(...)
(which is basically theflatten
you are looking for) instead ofchain(*...)
. It should be more memory efficient for the same end-result. \$\endgroup\$301_Moved_Permanently– 301_Moved_Permanently2017年08月23日 07:44:21 +00:00Commented Aug 23, 2017 at 7:44
The
copy()
call here{uppercase_key: 0, lowercase_key: 0}.copy()
is redundant.copy()
would have made sense if you had assigned this dict to a variable somewhere and were appending that variable.Don't use
+=
to create a big string, it can result in quadratic time.The recommended way is to create a list of such string and then join them usingstr.join
. Check: Performance tips: String Concatenation.count = counts[key]; counts[key] = count + 1
is basicallycounts[key] += 1
.
EDIT; Disclaimer: I actually understood the sentence "return all input characters in alphabetical order" as a set question, hence my answer, which will return only one character of each type present in the input string.
I left it as is, as it could answer some use case.
My first take would be the following but that will only work if only all characters are in the strings, else results are strange:
import string
def alphabetize(inputstr):
inputset = set(inputstr)
refset_caps = set (string.ascii_uppercase)
refset_mins = set (string.ascii_lowercase)
leftcaps = refset_caps -(refset_caps - inputset)
leftmins = refset_mins -(refset_mins - inputset)
finalstr = list(zip(leftcaps , leftmins))
#finalstr = str(l).strip("[']").replace("', '","")
#Line above should be used in case you need to return a string and not a list, you can adapt the replace input to keep commas, too.
return finalstr
Explanation 1: At first I thought of making only one set, but this is forgetting that set are not ordered per se. On the other hand, they have the nice property to return characters in unicode order, hence while
l = [a for tuple in zip(string.ascii_uppercase, string.ascii_lowercase) for a in tuple]
returns the list in the right order,
s = {a for tuple in zip(string.ascii_uppercase, string.ascii_lowercase) for a in tuple}
returns capitals in alphabetical order then minuscules in alphabetical order.
So I just make substraction operations on each set separately (I remove to each reference set the difference between reference set and set from the input, which leaves basically only the set from the input, but ordered this time, for each class of characters (capitals/minuscules) and join them in the zip, and switch to a list which keeps the input order.
My second alternative is a variation on the same principle, using only one set for reference: import string
def alphabetize(inputstr):
inputset = set(inputstr)
prestr = [chrt for sublist in zip(string.ascii_uppercase , string.ascii_lowercase) for chrt in sublist]
refset = set(prestr)
leftchars = refset -(refset - inputset)
finalstr = [character for character in inputset if character in leftchars ]
#I could also have written directly finalstr = [character for character in inputset if character not in (refset - inputset)]
#finalstr = str(prestr2 ).strip("[']").replace("', '","")
#Line above should be used in case you need to return a string and not a list, you can adapt the replace input to keep commas, too.
return finalstr
My third alternative works better, for every case:
import string
def alphabetize(inputstr):
inputset = set(inputstr)
prestr = [chrt for sublist in zip(string.ascii_uppercase , string.ascii_lowercase) for chrt in sublist]
refset = set(prestr)
for character in (refset - inputset):
prestr.remove(character )
"""but I guess it adds loads of complexity to the operation, as removing depends on length of the list -while here it is and always be limited- and is slower by conception than set operation"""
finalstr = prestr.copy() #or
#finalstr = str(prestr2 ).strip("[']").replace("', '","")
#Line above should be used in case you need to return a string and not a list, you can adapt the replace input to keep commas, too.
return finalstr
In any case, I prefer this to a Counter, and intrinsicly it also avoids to use most imports (you could even avoid importing string module by just typing your list off). I think this is quite pythonic by using the sets for such operations, the encapsulated list comprehension, too.
I don't know if it is an improvement in performance, but I think lighter is better, so having less import is better. Also, using Counter is a convoluted way to do something that already exists in Python using sets as an alternative to sorting.
Else, I can't provide more justification than personal taste, but on similar performance I would prefer using set, which is more fitting my mindset to resolve such problem.
Hope this help.
I would like to timeit, but I can't, I am writing from phone.
-
\$\begingroup\$ I think I understood the question differently, hence my take with sets. I left it as is as an answer to question variation, as other people thinking like me could come here by formulating the question with similar wording... \$\endgroup\$Ando Jurai– Ando Jurai2017年08月24日 10:49:22 +00:00Commented Aug 24, 2017 at 10:49
Explore related questions
See similar questions with these tags.