Return characters in a string in alphabetical order

Question 1

Problem

Take a string and return all input characters in alphabetical order

Assume that no numbers or punctuation will be used

For the same character, uppercase characters should be returned before lowercase characters

Do not use the sorted helper method

Do not use any libraries that are not included with Python 3.6.1

For example, alphabetize(HelLo) should return eHLlo

Approach

Create a 26-element list. This list will represent the counts (uppercase and lowercase) of the different alphabetical characters seen via a dictionary
Iterate through the characters in the input string
1. Get the index for each character by subtracting the int values of the lowercase character from the int value of a
2. Get the existing count for that index & case from the counts dictionary
3. Increment the count by 1
Initialize the return string
Iterate through each element in the list
1. For each dictionary / element, get the character, and get the uppercase and lowercase counts
2. Add the uppercase characters to the return string
3. Add the lowercase characters to the return string
Return the alphabetized string

Implementation Discussion

I know the 'Pythonic' way to accomplish this is something like ''.join(sorted(myString)).

However, as somebody trying to learn Python, I wanted to use this exercise to learn more about strings and characters in Python (for example) vs. Googling a one-liner (not that there's anything wrong with that).

Things I'm concerned about:

Should this be a class? Or is it ok as a standalone method? (As somebody that is more comfortable in Java, things that aren't in classes make me irrationally uneasy.)
Speaking of which, there's a decent amount of stuff going on in this method, should I break it up?
- If I decide to break it up, what's the right way of 'showing' that some functions are only going to be used by another function?
- I think Python doesn't really have a concept of private vs. public so do I _(some_method_name) here?
What other 'Pythonic' elements am I missing (again, aside from using the sorted method)?

Implementation

def alphabetize(value):
 uppercase_key = 'uppercase'
 lowercase_key = 'lowercase'
 character_counts = []
 for i in range(26):
 character_counts.append({uppercase_key: 0, lowercase_key: 0}.copy())
 for character in value:
 index = ord(character.lower()) - ord('a')
 counts = character_counts[index]
 key = uppercase_key if character.isupper() else lowercase_key
 count = counts[key]
 counts[key] = count + 1
 alphabetized = ''
 for index, counts in enumerate(character_counts):
 character = chr(ord('a') + index)
 uppercase_counts = counts[uppercase_key]
 lowercase_counts = counts[lowercase_key]
 for i in range(uppercase_counts):
 alphabetized += character.upper()
 for i in range(lowercase_counts):
 alphabetized += character
 return alphabetized

Question 2

What other 'Pythonic' elements am I missing (again, aside from using the sorted method)?

If you want to more heavily use the standard library, you can use a collections.Counter along with the alphabet constants string.ascii_uppercase and string.ascii_lowercase.

Things are also simpler if you store the counts in a dict (map) instead of the implicit map-by-ord()-value that you're currently using. The ordering of the counts doesn't really matter, just the order in which you output them.

from collections import Counter
from itertools import chain
import string
def alphabetize(s):
 counts = Counter(s)
 return ''.join(
 char * counts[char]
 for char in chain(*zip(string.ascii_uppercase, string.ascii_lowercase))
 )

The counts = Counter(s) line does essentially this:

counts = collections.defaultdict(int)
for char in s:
 counts[char] += 1

where the defaultdict (a handy class) is a dict that auto-initializes values based on the constructor you pass in, instead of throwing an exception for missing keys.

The chain(*zip(...)) is zipping together the uppercase and lowercase alphabets, then flattening them via chain() into a single list, ['A', 'a', 'B', ...].

Should this be a class?

Nope. This makes sense as a function in Python since it has no external state.

As somebody that is more comfortable in Java, things that aren't in classes make me irrationally uneasy.

Java ties files to classes, but each Python file is already a module, which is a good level of encapsulation. As a very rough rule, avoid classes in Python except when you want to hold shared state, in which case class away. Or, at least, don't reach first for classes.

Question 3

first of all this was very helpful, so thanks! In case anybody else was unfamiliar with zip and chain, I'm going to walk through what I think is happening (which is fairly simple, but was informative for me).

Question 4

zip(string.ascii_uppercase, string.ascii_lowercase) returns an iterable of tuples. Basically, ('A', 'a'), ('B', 'b')...('Z', 'z').

Question 5

* unpacks arguments. This means that *zip(string.ascii_uppercase, string.ascii_lowercase) results in ('A', 'a') as the first argument in the chain method ('B', 'b') the second, etc.

Question 6

chain basically combines all input iterables into a single iterable in the order the input iterables were provided. So in this case, since the method call looks like chain( ('A', 'a'), ('B', 'b'),...('Z', 'z') ) the output iterable is ('A', 'a', 'B', 'b',...'Z', 'z') as @BenC describes in his answer.

Question 7

I would have used chain.from_iterable(...) (which is basically the flatten you are looking for) instead of chain(*...). It should be more memory efficient for the same end-result.

Question 8

The copy() call here {uppercase_key: 0, lowercase_key: 0}.copy() is redundant. copy() would have made sense if you had assigned this dict to a variable somewhere and were appending that variable.
Don't use += to create a big string, it can result in quadratic time.The recommended way is to create a list of such string and then join them using str.join. Check: Performance tips: String Concatenation.
count = counts[key]; counts[key] = count + 1 is basically counts[key] += 1.

Question 9

EDIT; Disclaimer: I actually understood the sentence "return all input characters in alphabetical order" as a set question, hence my answer, which will return only one character of each type present in the input string.
I left it as is, as it could answer some use case.

My first take would be the following but that will only work if only all characters are in the strings, else results are strange:

import string
def alphabetize(inputstr):
 inputset = set(inputstr)
 refset_caps = set (string.ascii_uppercase)
 refset_mins = set (string.ascii_lowercase)
 leftcaps = refset_caps -(refset_caps - inputset)
 leftmins = refset_mins -(refset_mins - inputset)
 finalstr = list(zip(leftcaps , leftmins))
 #finalstr = str(l).strip("[']").replace("', '","")
 #Line above should be used in case you need to return a string and not a list, you can adapt the replace input to keep commas, too.
 return finalstr

Explanation 1: At first I thought of making only one set, but this is forgetting that set are not ordered per se. On the other hand, they have the nice property to return characters in unicode order, hence while

l = [a for tuple in zip(string.ascii_uppercase, string.ascii_lowercase) for a in tuple]

returns the list in the right order,

s = {a for tuple in zip(string.ascii_uppercase, string.ascii_lowercase) for a in tuple}

returns capitals in alphabetical order then minuscules in alphabetical order.

So I just make substraction operations on each set separately (I remove to each reference set the difference between reference set and set from the input, which leaves basically only the set from the input, but ordered this time, for each class of characters (capitals/minuscules) and join them in the zip, and switch to a list which keeps the input order.

My second alternative is a variation on the same principle, using only one set for reference: import string

def alphabetize(inputstr):
 inputset = set(inputstr)
 prestr = [chrt for sublist in zip(string.ascii_uppercase , string.ascii_lowercase) for chrt in sublist]
 refset = set(prestr)
 leftchars = refset -(refset - inputset)
 finalstr = [character for character in inputset if character in leftchars ]
 #I could also have written directly finalstr = [character for character in inputset if character not in (refset - inputset)]
 #finalstr = str(prestr2 ).strip("[']").replace("', '","")
 #Line above should be used in case you need to return a string and not a list, you can adapt the replace input to keep commas, too.
 return finalstr

My third alternative works better, for every case:

import string
def alphabetize(inputstr):
 inputset = set(inputstr)
 prestr = [chrt for sublist in zip(string.ascii_uppercase , string.ascii_lowercase) for chrt in sublist]
 refset = set(prestr)
 for character in (refset - inputset):
 prestr.remove(character ) 
 """but I guess it adds loads of complexity to the operation, as removing depends on length of the list -while here it is and always be limited- and is slower by conception than set operation"""
 finalstr = prestr.copy() #or
 #finalstr = str(prestr2 ).strip("[']").replace("', '","")
 #Line above should be used in case you need to return a string and not a list, you can adapt the replace input to keep commas, too.
 return finalstr

In any case, I prefer this to a Counter, and intrinsicly it also avoids to use most imports (you could even avoid importing string module by just typing your list off). I think this is quite pythonic by using the sets for such operations, the encapsulated list comprehension, too.

I don't know if it is an improvement in performance, but I think lighter is better, so having less import is better. Also, using Counter is a convoluted way to do something that already exists in Python using sets as an alternative to sorting.

Else, I can't provide more justification than personal taste, but on similar performance I would prefer using set, which is more fitting my mindset to resolve such problem.

Hope this help.

I would like to timeit, but I can't, I am writing from phone.

Question 10

I think I understood the question differently, hence my take with sets. I left it as is as an answer to question variation, as other people thinking like me could come here by formulating the question with similar wording...

BenC BenC 2,77811 silver badges22 bronze badges · Accepted Answer · 2017-08-23 03:55:04Z

What other 'Pythonic' elements am I missing (again, aside from using the sorted method)?

If you want to more heavily use the standard library, you can use a collections.Counter along with the alphabet constants string.ascii_uppercase and string.ascii_lowercase.

Things are also simpler if you store the counts in a dict (map) instead of the implicit map-by-ord()-value that you're currently using. The ordering of the counts doesn't really matter, just the order in which you output them.

from collections import Counter
from itertools import chain
import string
def alphabetize(s):
 counts = Counter(s)
 return ''.join(
 char * counts[char]
 for char in chain(*zip(string.ascii_uppercase, string.ascii_lowercase))
 )

The counts = Counter(s) line does essentially this:

counts = collections.defaultdict(int)
for char in s:
 counts[char] += 1

where the defaultdict (a handy class) is a dict that auto-initializes values based on the constructor you pass in, instead of throwing an exception for missing keys.

The chain(*zip(...)) is zipping together the uppercase and lowercase alphabets, then flattening them via chain() into a single list, ['A', 'a', 'B', ...].

Should this be a class?

Nope. This makes sense as a function in Python since it has no external state.

As somebody that is more comfortable in Java, things that aren't in classes make me irrationally uneasy.

Java ties files to classes, but each Python file is already a module, which is a good level of encapsulation. As a very rough rule, avoid classes in Python except when you want to hold shared state, in which case class away. Or, at least, don't reach first for classes.

first of all this was very helpful, so thanks! In case anybody else was unfamiliar with zip and chain, I'm going to walk through what I think is happening (which is fairly simple, but was informative for me).
zip(string.ascii_uppercase, string.ascii_lowercase) returns an iterable of tuples. Basically, ('A', 'a'), ('B', 'b')...('Z', 'z').
* unpacks arguments. This means that *zip(string.ascii_uppercase, string.ascii_lowercase) results in ('A', 'a') as the first argument in the chain method ('B', 'b') the second, etc.
chain basically combines all input iterables into a single iterable in the order the input iterables were provided. So in this case, since the method call looks like chain( ('A', 'a'), ('B', 'b'),...('Z', 'z') ) the output iterable is ('A', 'a', 'B', 'b',...'Z', 'z') as @BenC describes in his answer.
I would have used chain.from_iterable(...) (which is basically the flatten you are looking for) instead of chain(*...). It should be more memory efficient for the same end-result.

Stack Exchange Network

Return characters in a string in alphabetical order

Problem

Approach

Implementation Discussion

Implementation

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Return characters in a string in alphabetical order

Problem

Approach

Implementation Discussion

Implementation

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions