Finding a common prefix/suffix in a list/tuple of strings

Question 1

The question that sparked this question, was one on Stack Overflow in which the OP was looking for a way to find a common prefix among file names( a list of strings). While an answer was given that said to use something from the os library, I began to wonder how one might implement a common_prefix function.

I deiced to try my hand at finding out, and along with creating a common_prefix function, I also created a common_suffix function. After verifying that the functions worked, I deiced to go the extra mile; I documented my functions and made them into a package of sorts, as I'm sure they will come in handy later.

But before sealing up the package for good, I deiced I would try to make my code as "Pythonic" as possible, which lead me here.

I made sure to document my code heavily, so I feel confident that I shouldn't have to explain how the functions work, and how to use them:

from itertools import zip_longest
def all_same(items: (tuple, list, str)) -> bool:
 '''
 A helper function to test if 
 all items in the given iterable 
 are identical. 
 Arguments:
 item -> the given iterable to be used
 eg.
 >>> all_same([1, 1, 1])
 True
 >>> all_same([1, 1, 2])
 False
 >>> all_same((1, 1, 1))
 True
 >> all_same((1, 1, 2))
 False
 >>> all_same("111")
 True
 >>> all_same("112")
 False
 '''
 return all(item == items[0] for item in items)
def common_prefix(strings: (list, tuple), _min: int=0, _max: int=100) -> str:
 '''
 Given a list or tuple of strings, find the common prefix
 among them. If a common prefix is not found, an empty string
 will be returned.
 Arguments:
 strings -> the string list or tuple to
 be used.
 _min, _max - > If a common prefix is found, 
 Its length will be tested against the range _min 
 and _max. If its length is not in the range, and
 empty string will be returned, otherwise the prefix
 is returned 
 eg.
 >>> common_prefix(['hello', 'hemp', 'he'])
 'he'
 >>> common_prefix(('foobar', 'foobaz', 'foobam'))
 'foo'
 >>> common_prefix(['foobar', 'foobaz', 'doobam'])
 ''
 '''
 prefix = ""
 for tup in zip_longest(*strings):
 if all_same(tup):
 prefix += tup[0]
 else:
 if _min <= len(prefix) <= _max:
 return prefix
 else:
 return ''
def common_suffix(strings: (list, tuple), _min: int=0, _max: int=100) -> str:
 '''
 Given a list or tuple of strings, find the common suffix
 among them. If a common suffix is not found, an empty string
 will be returned.
 Arguments:
 strings -> the string list or tuple to
 be used.
 _min, _max - > If a common suffix is found, 
 Its length will be tested against the range _min 
 and _max. If its length is not in the range, and
 empty string will be returned, otherwise the suffix
 is returned 
 eg.
 >>> common_suffix([rhyme', 'time', 'mime'])
 'me'
 >>> common_suffix(('boo', 'foo', 'goo'))
 'oo'
 >>> common_suffix(['boo', 'foo', 'goz'])
 ''
 '''
 suffix = ""
 strings = [string[::-1] for string in strings]
 for tup in zip_longest(*strings):
 if all_same(tup):
 suffix += tup[0]
 else:
 if _min <= len(suffix) <= _max:
 return suffix[::-1]
 else:
 return ''

Question 2

common_suffix can be written as return common_prefix(string[::-1])[::-1] because the operations are just the simmetric of one another, and this way will prevent duplication.

Also I think you should not handle max or min inside the common_prefix function because it feels like the function has double responsabilty: finding prefixes + length interval check.

Why are you limiting yourself to strings? Python allows general functions very easily.

Why do you build all the result and then return it? You should yield the result item by item:

Why do you write so much yourself? Using the itertools module is much more efficient and simple:

def common_prefix(its):
 yield from itertools.takewhile(all_equal, zip(*its))

PS: common_suffix will now need to use reversed(list instead of [::-1]

Question 3

By the way: while I disagree on many aspects of this code, I find the documentation outstanding, and I could review it very fast and easy because of it

Question 4

Thanks! As a side note, I was considering using yield but I would have had know way of testing my prefix/suffix length which was important to my project.

Question 5

@Pythonic After you call the function do len(list(common_prefix)) in range(min, max). It may lose you on efficiency though. If you want to take really short parts of prefixes of really long prefixes you can use take to preserve efficiency. (take is islice)

Question 6

Alright, I'll see how that works out.

Question 7

@Pythonic Did you implement another version using islice? Was in range fast enough for you?

Question 8

If you want to use a type annotation for all_same(items: (tuple, list, str)), I suggest declaring items to be a typing.Sequence.

I don't understand why you want to do zip_longest(), when the length of the common prefix is certainly limited by the shortest input. A simple zip() should do.

Question 9

#!/usr/bin/env python3
# common prefix and common suffix of a list of strings
# https://stackoverflow.com/a/6719272/10440128
# https://codereview.stackexchange.com/a/145762/205605
import itertools
def all_equal(it):
 x0 = it[0]
 return all(x0 == x for x in it)
def common_prefix(strings):
 char_tuples = zip(*strings)
 prefix_tuples = itertools.takewhile(all_equal, char_tuples)
 return "".join(x[0] for x in prefix_tuples)
def common_suffix(strings):
 return common_prefix(map(reversed, strings))[::-1]
strings = ["aa1zz", "aaa2zzz", "aaaa3zzzz"]
assert common_prefix(strings) == "aa"
assert common_suffix(strings) == "zz"
print("ok")

Question 10

You did not review the existing solution. You provided an alternative answer without explaining how it is better than the existing one. Please edit your answer to comply with how to answer

Question 11

bla bla bla. i have converted the existing solution into actual code

Question 12

@milahu Please add an explanation of how your answer has improved the code. If you point to one thing (a what) and explain the improvement (a how) you should be clearly on the correct side of our rules.

Caridorc Caridorc 28k7 gold badges54 silver badges137 bronze badges · Accepted Answer · 2016-10-31 19:01:23Z

6

\$\begingroup\$

common_suffix can be written as return common_prefix(string[::-1])[::-1] because the operations are just the simmetric of one another, and this way will prevent duplication.

Also I think you should not handle max or min inside the common_prefix function because it feels like the function has double responsabilty: finding prefixes + length interval check.

Why are you limiting yourself to strings? Python allows general functions very easily.

Why do you build all the result and then return it? You should yield the result item by item:

Why do you write so much yourself? Using the itertools module is much more efficient and simple:

def common_prefix(its):
 yield from itertools.takewhile(all_equal, zip(*its))

PS: common_suffix will now need to use reversed(list instead of [::-1]

Share

edited Nov 3, 2016 at 13:53

answered Oct 31, 2016 at 19:01

Caridorc's user avatar

Caridorc Caridorc

28k7 gold badges54 silver badges137 bronze badges

\$\endgroup\$

6

\$\begingroup\$ By the way: while I disagree on many aspects of this code, I find the documentation outstanding, and I could review it very fast and easy because of it \$\endgroup\$

Caridorc
– Caridorc

2016年10月31日 19:20:09 +00:00
Commented Oct 31, 2016 at 19:20
\$\begingroup\$ Thanks! As a side note, I was considering using yield but I would have had know way of testing my prefix/suffix length which was important to my project. \$\endgroup\$

Chris
– Chris

2016年10月31日 19:28:03 +00:00
Commented Oct 31, 2016 at 19:28
\$\begingroup\$ @Pythonic After you call the function do len(list(common_prefix)) in range(min, max). It may lose you on efficiency though. If you want to take really short parts of prefixes of really long prefixes you can use take to preserve efficiency. (take is islice) \$\endgroup\$

Caridorc
– Caridorc

2016年10月31日 19:31:32 +00:00
Commented Oct 31, 2016 at 19:31
\$\begingroup\$ Alright, I'll see how that works out. \$\endgroup\$

Chris
– Chris

2016年10月31日 19:32:56 +00:00
Commented Oct 31, 2016 at 19:32
\$\begingroup\$ @Pythonic Did you implement another version using islice? Was in range fast enough for you? \$\endgroup\$

Caridorc
– Caridorc

2016年11月02日 22:58:54 +00:00
Commented Nov 2, 2016 at 22:58

| Show 1 more comment

Stack Exchange Network

Finding a common prefix/suffix in a list/tuple of strings

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Finding a common prefix/suffix in a list/tuple of strings

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions