2
\$\begingroup\$

So I get a user input of a lot of numbers in a single line like this

100 100 200 300 300 300 400 500 500 700.

I want to store the distinct ones in a list. A simple approach is to do this:

nums = input().split(' ')
myList = list(set([(int(i) for i in nums]))

My question is, is this a faster/better way:

nums = input().split(' ')
setList = set()
myList = []
for i in nums:
 if int(i) not in setList:
 myList.append(int(i))
 setList.add(int(i))
Deduplicator
19.6k1 gold badge32 silver badges65 bronze badges
asked Jul 6, 2017 at 15:01
\$\endgroup\$

1 Answer 1

2
\$\begingroup\$

No, it is not. The reason is that list comprehensions are in general as fast as or faster than the explicit loop, because they might be performed in C, while the explicit loop will always execute in Python. Also, the set constructor already takes care of the i not in setList part and you need to traverse the whole list anyways.

Here is a shorter one-liner:

numbers = list(set(map(int, input().split())))

I used map instead of a list comprehension, but you could also use a generator expression:

numbers = list(set(int(i) for i in input().split()))

Note that str.split by default splits on whitespace, so there is no need to specify split(' '), unless you want a different result for strings like 100\n200(newline) or 100 200 (double space).


As a timing comparisons, here are the four proposals as functions taking an input string and returning the desired list:

def op1(s):
 nums = s.split(' ')
 return list(set([int(i) for i in nums]))
def op2(s):
 nums = s.split(' ')
 setList = set()
 myList = []
 for i in nums:
 if int(i) not in setList:
 myList.append(int(i))
 setList.add(int(i))
 return myList
def graipher1(s):
 return list(set(map(int, s.split())))
def graipher2(s):
 return list(set(int(i) for i in s.split()))

And this is how I generated a bunch of random inputs of increasing size:

import random
inputs = [" ".join(map(str, [random.randrange(1000) for _ in range(i)])) for i in range(1, 2000, 50)]

With this, I ran the four functions to get this timing graph as a function of the number of numbers in the input:

enter image description here

Note that your first function performs almost as good as a straight set comprehension. Your alternative proposal is always the worst of all possibilities. The version with map is the fastest of the approaches I tried.

(Note that my timings were done using Python 2.7, so your mileage may vary, but I am confident that your second proposal will still loose.)

Deduplicator
19.6k1 gold badge32 silver badges65 bronze badges
answered Jul 6, 2017 at 15:44
\$\endgroup\$
3
  • \$\begingroup\$ Hi, you are correct. The approach 1 was significantly faster and I even added a sort() method because set() distorts the order and even after that, approach 1 (which is graphier1 in your answer) was the fastest. Thank you very much for the explanation. \$\endgroup\$ Commented Jul 6, 2017 at 15:49
  • \$\begingroup\$ @ParthBhoiwala Updated the graph for higher number of numbers \$\endgroup\$ Commented Jul 6, 2017 at 15:52
  • 1
    \$\begingroup\$ After researching online, you were correct, list comprehensions are performed in C which makes it significantly faster than for-loops. This is very interesting! \$\endgroup\$ Commented Jul 6, 2017 at 15:59

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.