I was implementing something similar to Python's join function, where
join([a1, a2, ..., aN], separator :: String)
returns
str(a1) + separator + str(a2) + separator + ... + str(aN)
e.g.,
join([1, 2, 3], '+') == '1+2+3'
I was implementing something similar and was wondering, what is a good pattern to do this? Because there is the issue of only adding the separator if it is not the last element
def join(l, sep):
out_str = ''
for i, el in enumerate(l):
out_str += '{}{}'.format(el, sep)
return out_str[:-len(sep)]
I'm quite happy with this, but is there a canoncial approach?
4 Answers 4
Strings in Python are immutable, and so 'string a' + 'string b'
has to make a third string to combine them. Say you want to clone a string, by adding each item to the string will get \$O(n^2)\$ time, as opposed to \$O(n)\$ as you would get if it were a list.
And so, the best way to join an iterable by a separator is to use str.join
.
>>> ','.join('abcdef')
'a,b,c,d,e,f'
If you want to do this manually, then I'd accept the \$O(n^2)\$ performance, and write something easy to understand. One way to do this is to take the first item, and add a separator and an item every time after, such as:
def join(iterator, seperator):
it = map(str, iterator)
seperator = str(seperator)
string = next(it, '')
for s in it:
string += seperator + s
return string
-
\$\begingroup\$ Very nice, thanks! Yeah I know about
str.join
, I was just implementing something slightly different and wondered how to do it nicely. I like your approach with usingnext
at the beginning! Do you know where I can find the source ofstr.join
though? Google didn't help.. \$\endgroup\$fabian789– fabian7892017年05月08日 11:28:45 +00:00Commented May 8, 2017 at 11:28 -
1\$\begingroup\$ @fabian789 The source for
str.join
is probably this. It looks about right, and is written in C. \$\endgroup\$2017年05月08日 11:44:59 +00:00Commented May 8, 2017 at 11:44
Let's take that step by step:
def join(l, sep):
out_str = ''
for i, el in enumerate(l):
Here, why do you need the enumerate
? You could write for el in l:
out_str += '{}{}'.format(el, sep)
.format
is not super efficient, there are other methods. You can have a look at This question for some researches and benchmarks on performances.
return out_str[:-len(sep)]
This makes little sense for l = []
if len(sep) > 1
. ''[:-1]
is valid, and returns ''
, because python is nice, but it is not a very good way of getting around that limit case.
In general, adding something just to remove it at the end is not great.
Creating an iter
, looking at the first value, then adding the rest, as it has been suggested in other answers, is much better.
I would also recommend writing some unit tests, so that you can then play around with the implementation, and stay confident that what you write still works.
Typically, you could write:
# Empty list
join([], '') == ''
# Only one element, -> no separator in output
join(['a'], '-') == 'a'
# Empty separator
join(['a', 'b'], '') == 'ab'
# "Normal" case
join(['a', 'b'], '--') == 'a--b'
# ints
join([1, 2], 0) == '102'
There are a number of ways you can go about doing this, but using an iterator can be a nice approach:
l = [1, 2, 3, 4]
def join_l(l, sep):
li = iter(l)
string = str(next(li))
for i in li:
string += str(sep) + str(i)
return string
print join_l(l, "-")
Using the first next()
call allows you to do something different with the first item of your iterable before you loop over the rest using the for
loop.
-
1\$\begingroup\$ try it with
[]
\$\endgroup\$njzk2– njzk22017年05月08日 15:58:10 +00:00Commented May 8, 2017 at 15:58
As join is already a Python built in function, it is recommended not to create a function identically named. I think will be a good idea to rename your function to exclude possible conflicts.
-
4\$\begingroup\$ This is just false,
help(join)
results inNameError: name 'join' is not defined
. Nowhelp(str.join)
exists, but that's not going to cause any conflicts. \$\endgroup\$2020年07月08日 08:58:14 +00:00Commented Jul 8, 2020 at 8:58 -
3\$\begingroup\$ "As join is already a Python built in function, it is recommended not to create a function identically named" - this also doesn't apply in the context of what OP asked. That's why they added the "reinventing-the-wheel" tag in the first place :) \$\endgroup\$Grajdeanu Alex– Grajdeanu Alex2020年07月08日 09:31:30 +00:00Commented Jul 8, 2020 at 9:31
Explore related questions
See similar questions with these tags.
join
the separator is the first argument to avoid ambiguity. \$\endgroup\$malloc/realloc()
call in each loop, but cPython special-cases this, so it's only N*O(1) = O(N). In native Python. string.join or sep.join are faster because they're one Python call, not N. See Is the time-complexity of iterative string append actually O(n^2), or O(n)? \$\endgroup\$delimiter
more thanseparator
\$\endgroup\$