25
\$\begingroup\$

I was implementing something similar to Python's join function, where

join([a1, a2, ..., aN], separator :: String)

returns

str(a1) + separator + str(a2) + separator + ... + str(aN)

e.g.,

join([1, 2, 3], '+') == '1+2+3'

I was implementing something similar and was wondering, what is a good pattern to do this? Because there is the issue of only adding the separator if it is not the last element

def join(l, sep):
 out_str = ''
 for i, el in enumerate(l):
 out_str += '{}{}'.format(el, sep)
 return out_str[:-len(sep)]

I'm quite happy with this, but is there a canoncial approach?

Peilonrayz
44.4k7 gold badges80 silver badges157 bronze badges
asked May 8, 2017 at 9:06
\$\endgroup\$
4
  • \$\begingroup\$ In Perl's join the separator is the first argument to avoid ambiguity. \$\endgroup\$ Commented May 8, 2017 at 11:34
  • \$\begingroup\$ Ok in general appending N arbitrary strings iteratively would be O(N^2) in most languages and implementations, because it requires a malloc/realloc() call in each loop, but cPython special-cases this, so it's only N*O(1) = O(N). In native Python. string.join or sep.join are faster because they're one Python call, not N. See Is the time-complexity of iterative string append actually O(n^2), or O(n)? \$\endgroup\$ Commented Jan 9, 2020 at 12:07
  • \$\begingroup\$ I like the word delimiter more than separator \$\endgroup\$ Commented Jan 10, 2020 at 12:32
  • \$\begingroup\$ @bhathiya-perera "delimiter" is broader than "separator", and is also technical jargon. \$\endgroup\$ Commented Jan 10, 2020 at 22:26

4 Answers 4

29
\$\begingroup\$

Strings in Python are immutable, and so 'string a' + 'string b' has to make a third string to combine them. Say you want to clone a string, by adding each item to the string will get \$O(n^2)\$ time, as opposed to \$O(n)\$ as you would get if it were a list.

And so, the best way to join an iterable by a separator is to use str.join.

>>> ','.join('abcdef')
'a,b,c,d,e,f'

If you want to do this manually, then I'd accept the \$O(n^2)\$ performance, and write something easy to understand. One way to do this is to take the first item, and add a separator and an item every time after, such as:

def join(iterator, seperator):
 it = map(str, iterator)
 seperator = str(seperator)
 string = next(it, '')
 for s in it:
 string += seperator + s
 return string
answered May 8, 2017 at 9:40
\$\endgroup\$
2
  • \$\begingroup\$ Very nice, thanks! Yeah I know about str.join, I was just implementing something slightly different and wondered how to do it nicely. I like your approach with using next at the beginning! Do you know where I can find the source of str.join though? Google didn't help.. \$\endgroup\$ Commented May 8, 2017 at 11:28
  • 1
    \$\begingroup\$ @fabian789 The source for str.join is probably this. It looks about right, and is written in C. \$\endgroup\$ Commented May 8, 2017 at 11:44
5
\$\begingroup\$

Let's take that step by step:

def join(l, sep):
 out_str = ''
 for i, el in enumerate(l):

Here, why do you need the enumerate? You could write for el in l:

 out_str += '{}{}'.format(el, sep)

.format is not super efficient, there are other methods. You can have a look at This question for some researches and benchmarks on performances.

 return out_str[:-len(sep)]

This makes little sense for l = [] if len(sep) > 1. ''[:-1] is valid, and returns '', because python is nice, but it is not a very good way of getting around that limit case.

In general, adding something just to remove it at the end is not great.

Creating an iter, looking at the first value, then adding the rest, as it has been suggested in other answers, is much better.

I would also recommend writing some unit tests, so that you can then play around with the implementation, and stay confident that what you write still works.

Typically, you could write:

# Empty list
join([], '') == ''
# Only one element, -> no separator in output
join(['a'], '-') == 'a'
# Empty separator
join(['a', 'b'], '') == 'ab'
# "Normal" case
join(['a', 'b'], '--') == 'a--b'
# ints
join([1, 2], 0) == '102'
answered May 8, 2017 at 16:10
\$\endgroup\$
2
\$\begingroup\$

There are a number of ways you can go about doing this, but using an iterator can be a nice approach:

l = [1, 2, 3, 4]
def join_l(l, sep):
 li = iter(l)
 string = str(next(li))
 for i in li:
 string += str(sep) + str(i)
 return string
print join_l(l, "-")

Using the first next() call allows you to do something different with the first item of your iterable before you loop over the rest using the for loop.

Jamal
35.2k13 gold badges134 silver badges238 bronze badges
answered May 8, 2017 at 9:27
\$\endgroup\$
1
  • 1
    \$\begingroup\$ try it with [] \$\endgroup\$ Commented May 8, 2017 at 15:58
-2
\$\begingroup\$

As join is already a Python built in function, it is recommended not to create a function identically named. I think will be a good idea to rename your function to exclude possible conflicts.

answered Jul 8, 2020 at 8:13
\$\endgroup\$
2
  • 4
    \$\begingroup\$ This is just false, help(join) results in NameError: name 'join' is not defined. Now help(str.join) exists, but that's not going to cause any conflicts. \$\endgroup\$ Commented Jul 8, 2020 at 8:58
  • 3
    \$\begingroup\$ "As join is already a Python built in function, it is recommended not to create a function identically named" - this also doesn't apply in the context of what OP asked. That's why they added the "reinventing-the-wheel" tag in the first place :) \$\endgroup\$ Commented Jul 8, 2020 at 9:31

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.