Refactor Oxford Comma function (and other functions)

Question 1

I wrote a program that managed projects and created a set of SQL files for each project. In the logic, I have a few functions that, each time I look at them, I wonder if they could be written in a better way. They are pretty straight forward and compact, however I feel like there are better, more efficient ways of computing their results.

I would appreciate any comments on coding style or improvements (either in readability or performance) that you may have.

Oxford Comma

This function takes a list as input then passes an Oxford comma-separated string off to another function:

# >>> obj._format_missing_message(['a','b','c'])
# 'a, b, and c'
def _format_missing_message(self, missing): 
 length = len(missing)
 # Show the message if needed
 if length > 0:
 and_clause = '{} and '.format('' if length <= 2 else ',')
 message = and_clause.join([', '.join(missing[:-1]), missing[-1]])
 self._tab_pane.currentWidget().update_status(
 "Configs not found for: {}.".format(message))

SQL 'LIKE' String Creator

This next function takes a list of string inputs then finds the SQL LIKE syntax that recognizes all of the list:

import difflib
# >>> create_sql_regex(['123','124','125'])
# '12%'
def create_sql_regex(ids):
 ''' Creates a SQL regex from a list of strings for use in LIKE statement '''
 longest_match = ''
 # Raise an error if there is only one item in the list as there needs to be at least two
 # items to create the regex.
 length = len(ids)
 if length <= 1:
 raise NotEnoughComparisonData(
 'Not enough data to compare. Passed list length: {}'.format(length))
 # Create the SequenceMatcher and loop through each element in the list, comparing it and
 # the previous item.
 matcher = difflib.SequenceMatcher()
 for item, next_item in zip(ids[:-1], ids[1:]):
 matcher.set_seqs(item, next_item)
 long_match = matcher.find_longest_match(0, len(item), 0, len(next_item))
 # If the match is shorter than the previous longest match or if we have not found
 # a match yet, store the match
 if long_match.size < len(longest_match) or longest_match == '':
 longest_match = item[long_match.a:long_match.a + long_match.size]
 # If not match was found, raise an error
 if longest_match == '':
 raise NoMatchFound('A match could not be found in this list: {0}'.format(ids))
 return '{}%'.format(longest_match)

Count Duplicates

This function is the simplest of the three. It takes a list as input and returns a hash that holds the count of each unique item found in the list:

# >>> count_duplicates(['a','a','b'])
# {'a':2, 'b':1}
def count_duplicates(values):
 counts = {}
 for val in values:
 if val in counts:
 counts[val] += 1
 else:
 counts[val] = 1
 return counts

Question 2

Oxford comma

Are these really your expected outcomes?

# for ['a']
Configs not found for: and a.
# for ['a', 'b']
Configs not found for: a and b.
# for ['a', 'b', 'c']
Configs not found for: a, b, and c.

In particular, the case of a single item looks like a bug, and in the case of 3 or more items the 2 spaces between a, b looks a bit strange.

In any case, it would make sense to split the method, for example:

def _format_missing_message(self, missing):
 if missing:
 message = _oxford_comma(missing)
 self._tab_pane.currentWidget().update_status(
 "Configs not found for: {}.".format(message))
@staticmethod
def _oxford_comma(items):
 length = len(items)
 and_clause = '{} and '.format('' if length <= 2 else ',')
 return and_clause.join([', '.join(items[:-1]), items[-1]])

This way it's easier to unit test the Oxford comma logic.

Finally, I think the method would be slightly easier to read like this:

def _oxford_comma(items):
 length = len(items)
 if length == 1:
 return items[0]
 if length == 2:
 return '{} and {}'.format(*items)
 return '{}, and {}'.format(', '.join(items[:-1]), items[-1])

Count Duplicates

My only objection is the method name, count_duplicates. You are counting elements, not necessarily duplicates, and returning a map of item => count pairs. So for example get_counts would be more natural.

That's all I can pick on ;-)

Question 3

Yeah the 1 item oxford comma is a bug. As for the double-space, it was a mistype :P Thanks for the input!

Question 4

See here for a more concise way to count the letter in a string.

Question 5

@DarinDouglass nice to know, thanks for the reminder! (I keep forgetting that handy tool.)

janos janos 113k15 gold badges154 silver badges396 bronze badges · Accepted Answer · 2014-05-20 19:50:45Z

Oxford comma

Are these really your expected outcomes?

# for ['a']
Configs not found for: and a.
# for ['a', 'b']
Configs not found for: a and b.
# for ['a', 'b', 'c']
Configs not found for: a, b, and c.

In particular, the case of a single item looks like a bug, and in the case of 3 or more items the 2 spaces between a, b looks a bit strange.

In any case, it would make sense to split the method, for example:

def _format_missing_message(self, missing):
 if missing:
 message = _oxford_comma(missing)
 self._tab_pane.currentWidget().update_status(
 "Configs not found for: {}.".format(message))
@staticmethod
def _oxford_comma(items):
 length = len(items)
 and_clause = '{} and '.format('' if length <= 2 else ',')
 return and_clause.join([', '.join(items[:-1]), items[-1]])

This way it's easier to unit test the Oxford comma logic.

Finally, I think the method would be slightly easier to read like this:

def _oxford_comma(items):
 length = len(items)
 if length == 1:
 return items[0]
 if length == 2:
 return '{} and {}'.format(*items)
 return '{}, and {}'.format(', '.join(items[:-1]), items[-1])

Count Duplicates

My only objection is the method name, count_duplicates. You are counting elements, not necessarily duplicates, and returning a map of item => count pairs. So for example get_counts would be more natural.

That's all I can pick on ;-)

Yeah the 1 item oxford comma is a bug. As for the double-space, it was a mistype :P Thanks for the input!
See here for a more concise way to count the letter in a string.
@DarinDouglass nice to know, thanks for the reminder! (I keep forgetting that handy tool.)

Stack Exchange Network

Refactor Oxford Comma function (and other functions)

Oxford Comma

SQL 'LIKE' String Creator

Count Duplicates

1 Answer 1

Oxford comma

Count Duplicates

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Refactor Oxford Comma function (and other functions)

Oxford Comma

SQL 'LIKE' String Creator

Count Duplicates

1 Answer 1

Oxford comma

Count Duplicates

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions