Remove repeated numbers from a list

Question 1

So, I just managed to finish a problem that was proposed to me, and I find it very difficult since I started learning looping with for and using lists recently, I thought about sharing the problem and my solution with you guys so you could give me some advice on how I should make my code better or different ways to solve the problem, hopefully you guys can help me, thank you for your attention and sorry if I made some mistakes, I am new to programming.

Question

Write a function that receives a list with natural numbers, verify if such list has repeated numbers, if so remove them. The function should return a correspondent list with no repeated elements. The list should be returned in ascending order.

def remove_repeated_numbers(z):
 x = 1
 for i in z:
 y = x
 while y <= (len(z) - 1):
 if i == z[y]:
 del z[y]
 y = (y - 1) #Once z[y] is deleted, should check same spot again
 y = (y + 1)
 x = (x + 1)
 list_finished = sorted(z[:])
 return list_finished
def main():
 run = True
 z = []
 while run:
 u = int(input("Entry value: "))
 if u != 0:
 z.append(u)
 else:
 run = False
 print(remove_repeated_numbers(z))
main()

Question 2

You may be interested in the SO question What is the cleanest way to do a sort plus uniq on a Python list?

Question 3

Do you know that your ENTIRE script can be replaced with this ONE-liner: sorted(set(l))? This is extremely basic, set is a built-in data type that is ordered without duplicates with fast membership checking that is exactly what is designed to do this sort of thing, and then you just need to use sorted function to return a sorted (shallow) copy of the list. Type casting is by far the most efficient and readable way to do this.

Question 4

@XeнεiΞэnвες I tried your sorted(set(l)). Didn't work. Got NameError: name 'l' is not defined.

Question 5

@BolucPapuccuoglu NameError: name 'z' is not defined.

Question 6

@BolucPapuccuoglu Yeah, but XeнεiΞэnвες said to replace ENTIRE script with that (shouting already theirs).

Question 7

It's a good start. Depending on what class this is for, the person (teacher?) who proposed this question to you might care about time complexity of your solution. Your solution is going to have a number of problems there, since:

del z[y] is linear-time ("O(n)")
That occurs within a loop over the list, within another loop over the list, making the solution worst-case cubic if I'm not mistaken.

The trivial and common solution is to scrap the whole routine and replace it with calls to two built-ins:

from typing import Iterable, List
def remove_repeated_numbers(numbers: Iterable[int]) -> List[int]:
 return sorted(set(numbers))

I will leave it as an exercise to you to read the documentation and determine the complexity of this implementation.

This will construct a set from the numbers, which guarantees uniqueness but not order; and then constructs a sorted list from that. Interpreted literally, the question doesn't call for the existence of a main but I don't know how picky the problem is.

Other things to note:

Don't use single-letter variables
Add type hints for greater clarity

Question 8

Add some unit tests

Are we sure that the function works? Will it still work when we make changes (perhaps to improve its performance)?

Python comes with some unit-testing facilities to help us ensure that we don't break the function. One that's very easy to use is doctest, which lets us put the tests right by the code itself.

We could start by testing an empty list:

def remove_repeated_numbers(z):
 """
 Return a sorted copy of the input list, with duplicates removed.
 >>> remove_repeated_numbers([])
 []
 """
 ⋮
 ⋮

We write each test as a line beginning with >>> and an expression to evaluate, with the expected result on the following line.

We can run the tests with some easy boilerplate (which we don't need to understand right now):

if __name__ == '__main__':
 import doctest
 doctest.testmod()

To be sure that our tests work properly, let's add a test that we know will fail:

>>> remove_repeated_numbers([])
None

That gives us a useful diagnostic:

**********************************************************************
File "268315.py", line 7, in __main__.remove_repeated_numbers
Failed example:
 remove_repeated_numbers([])
Expected:
 None
Got:
 []
**********************************************************************

Let's remove that, and add some more tests:

 >>> remove_repeated_numbers([0])
 [0]
 >>> remove_repeated_numbers([0, 0])
 [0]
 >>> remove_repeated_numbers([0, 1])
 [0, 1]
 >>> remove_repeated_numbers([1, 0])
 [0, 1]
 >>> remove_repeated_numbers([0, 0, 1, 1, 1])
 [0, 1]
 >>> remove_repeated_numbers([0, 1, 1, 1, 0])
 [0, 1]
 """

That's good, and we now have more confidence in what we're doing.

We can now change the implementation to use a set, as suggested in a different answer, but I'll propose a smaller change, and that is to sort the input first:

for i in sorted(z):

That initially seems like it's making more work (after all, the input could be much longer than the output), but it will reduce the later operations, because now all the duplicates are adjacent, so we don't need to examine the whole list to find them.

Now, instead of the expensive del z[y] (which needs to shuffle all the subsequent elements in the list), we can copy to a new list to return to the user. Note that this changes the behaviour - I believe for the better. The code presented modifies the actual array passed to the function (though the modified array is only deduplicated, not sorted). It's generally less surprising to return a copy so that the original data is unchanged.

Just to demonstrate, try this code with the function as posted:

a = [ 2, 1, 1, 2]
print(remove_repeated_numbers(a))
print(a)

With this code, the function returns [1, 2] and a is changed to [2, 1].

So let's create our result array:

result = []
⋮
return result

Now our loop is very simple: for each element we see, if it's the same as the one we most recently added to result, then ignore it, else add it to result:

for i in sorted(z):
 if i != result[-1]:
 result.append(i)

That's not quite right, because result starts off empty. We need to always include the first element, so we must change the test to also succeed when result is empty - i.e. not result:

for i in sorted(z):
 if not result or i != result[-1]:
 result.append(i)

I think that's easier to understand than code that deletes items from the input list.

Full modified code

For reference, here's the full program from this approach:

def remove_repeated_elements(input_list):
 """
 Return a sorted copy of the input list, with duplicates removed.
 >>> remove_repeated_elements([])
 []
 >>> remove_repeated_elements([0])
 [0]
 >>> remove_repeated_elements([0, 0])
 [0]
 >>> remove_repeated_elements([0, 1])
 [0, 1]
 >>> remove_repeated_elements([1, 0])
 [0, 1]
 >>> remove_repeated_elements([0, 0, 1, 1, 1])
 [0, 1]
 >>> remove_repeated_elements([0, 1, 1, 1, 0])
 [0, 1]
 >>> remove_repeated_elements(['review', 'code', 'code', 'review'])
 ['code', 'review']
 """
 result = []
 for element in sorted(input_list):
 if not result or result[-1] < element:
 result.append(element)
 return result
if __name__ == '__main__':
 import doctest
 doctest.testmod()

Note that I changed the name slightly, as this will work with any type that can be ordered (such as strings - see the last test), not just numbers. The answer that recommends using set also works with more than just numbers (but requires a type that can be hashed, as well as ordered).

I also changed the test to use <, so it will work for any (admittedly theoretical, and highly improbable) type that can be sorted but not compared for inequality.

Question 9

I really like this answer, and I suspect indeed the eliminating the duplicates in a sorted list is the core of the exercise -- otherwise it could just as well require the elements to be returned in the order they were in the initial list. I do find not result slightly difficult to read, and would favor len(result) == 0 instead as I find it more "immediate" in meaning; is not result considered more idiomatic?

Question 10

There's some debate on what's preferred - some Python people prefer the length test and others the truth test. My opinion on that isn't very strong.

Question 11

Since sorted only require comparison via __lt__, you could write result[-1] < element for the rare cases __ne__ is not defined. Also, what's the rationale of using result += [element] rather than result.append(element)?

Question 12

@301: Both good points - I'll modify accordingly.

Question 13

The first thing to know about Python is that most basic operations on lists have helper built-ins. Your code could use set() as the holding structure which will ensure uniqueness for you without any work.

With that out of the way, I think your assignment is more about showing your work so let's take that view on it:

Your run variable is not needed. You can use break keyword to get out of the while loop.
You should try to do the early escape first in conditions to avoid nesting. In your case, you can say "if input is 0, break" then everything below assumes that value is not 0 so you don't need an else.
Double loop is overkill. if <numer> not in <array>: then add the number to array. It's slow but easy. You can do better with a set or a dictionary but that's excessive work for what you are doing.
If you're working in Python avoid index-based iteration. You can/should use iterables: for value in entered_data:.
Finally, avoid short names on variables. They create unmaintainable code and keep mental overhead high.

Reinderien Reinderien 70.9k5 gold badges76 silver badges256 bronze badges · Answer 1 · 2021-09-24 03:04:23Z

It's a good start. Depending on what class this is for, the person (teacher?) who proposed this question to you might care about time complexity of your solution. Your solution is going to have a number of problems there, since:

del z[y] is linear-time ("O(n)")
That occurs within a loop over the list, within another loop over the list, making the solution worst-case cubic if I'm not mistaken.

The trivial and common solution is to scrap the whole routine and replace it with calls to two built-ins:

from typing import Iterable, List
def remove_repeated_numbers(numbers: Iterable[int]) -> List[int]:
 return sorted(set(numbers))

I will leave it as an exercise to you to read the documentation and determine the complexity of this implementation.

This will construct a set from the numbers, which guarantees uniqueness but not order; and then constructs a sorted list from that. Interpreted literally, the question doesn't call for the existence of a main but I don't know how picky the problem is.

Other things to note:

Don't use single-letter variables
Add type hints for greater clarity

Toby Speight Toby Speight 87.3k14 gold badges104 silver badges322 bronze badges · Answer 2 · 2021-09-24 07:43:00Z

Add some unit tests

Are we sure that the function works? Will it still work when we make changes (perhaps to improve its performance)?

Python comes with some unit-testing facilities to help us ensure that we don't break the function. One that's very easy to use is doctest, which lets us put the tests right by the code itself.

We could start by testing an empty list:

def remove_repeated_numbers(z):
 """
 Return a sorted copy of the input list, with duplicates removed.
 >>> remove_repeated_numbers([])
 []
 """
 ⋮
 ⋮

We write each test as a line beginning with >>> and an expression to evaluate, with the expected result on the following line.

We can run the tests with some easy boilerplate (which we don't need to understand right now):

if __name__ == '__main__':
 import doctest
 doctest.testmod()

To be sure that our tests work properly, let's add a test that we know will fail:

>>> remove_repeated_numbers([])
None

That gives us a useful diagnostic:

**********************************************************************
File "268315.py", line 7, in __main__.remove_repeated_numbers
Failed example:
 remove_repeated_numbers([])
Expected:
 None
Got:
 []
**********************************************************************

Let's remove that, and add some more tests:

 >>> remove_repeated_numbers([0])
 [0]
 >>> remove_repeated_numbers([0, 0])
 [0]
 >>> remove_repeated_numbers([0, 1])
 [0, 1]
 >>> remove_repeated_numbers([1, 0])
 [0, 1]
 >>> remove_repeated_numbers([0, 0, 1, 1, 1])
 [0, 1]
 >>> remove_repeated_numbers([0, 1, 1, 1, 0])
 [0, 1]
 """

That's good, and we now have more confidence in what we're doing.

We can now change the implementation to use a set, as suggested in a different answer, but I'll propose a smaller change, and that is to sort the input first:

for i in sorted(z):

That initially seems like it's making more work (after all, the input could be much longer than the output), but it will reduce the later operations, because now all the duplicates are adjacent, so we don't need to examine the whole list to find them.

Now, instead of the expensive del z[y] (which needs to shuffle all the subsequent elements in the list), we can copy to a new list to return to the user. Note that this changes the behaviour - I believe for the better. The code presented modifies the actual array passed to the function (though the modified array is only deduplicated, not sorted). It's generally less surprising to return a copy so that the original data is unchanged.

Just to demonstrate, try this code with the function as posted:

a = [ 2, 1, 1, 2]
print(remove_repeated_numbers(a))
print(a)

With this code, the function returns [1, 2] and a is changed to [2, 1].

So let's create our result array:

result = []
⋮
return result

Now our loop is very simple: for each element we see, if it's the same as the one we most recently added to result, then ignore it, else add it to result:

for i in sorted(z):
 if i != result[-1]:
 result.append(i)

That's not quite right, because result starts off empty. We need to always include the first element, so we must change the test to also succeed when result is empty - i.e. not result:

for i in sorted(z):
 if not result or i != result[-1]:
 result.append(i)

I think that's easier to understand than code that deletes items from the input list.

Full modified code

For reference, here's the full program from this approach:

def remove_repeated_elements(input_list):
 """
 Return a sorted copy of the input list, with duplicates removed.
 >>> remove_repeated_elements([])
 []
 >>> remove_repeated_elements([0])
 [0]
 >>> remove_repeated_elements([0, 0])
 [0]
 >>> remove_repeated_elements([0, 1])
 [0, 1]
 >>> remove_repeated_elements([1, 0])
 [0, 1]
 >>> remove_repeated_elements([0, 0, 1, 1, 1])
 [0, 1]
 >>> remove_repeated_elements([0, 1, 1, 1, 0])
 [0, 1]
 >>> remove_repeated_elements(['review', 'code', 'code', 'review'])
 ['code', 'review']
 """
 result = []
 for element in sorted(input_list):
 if not result or result[-1] < element:
 result.append(element)
 return result
if __name__ == '__main__':
 import doctest
 doctest.testmod()

Note that I changed the name slightly, as this will work with any type that can be ordered (such as strings - see the last test), not just numbers. The answer that recommends using set also works with more than just numbers (but requires a type that can be hashed, as well as ordered).

I also changed the test to use <, so it will work for any (admittedly theoretical, and highly improbable) type that can be sorted but not compared for inequality.

I really like this answer, and I suspect indeed the eliminating the duplicates in a sorted list is the core of the exercise -- otherwise it could just as well require the elements to be returned in the order they were in the initial list. I do find not result slightly difficult to read, and would favor len(result) == 0 instead as I find it more "immediate" in meaning; is not result considered more idiomatic?
There's some debate on what's preferred - some Python people prefer the length test and others the truth test. My opinion on that isn't very strong.
Since sorted only require comparison via __lt__, you could write result[-1] < element for the rare cases __ne__ is not defined. Also, what's the rationale of using result += [element] rather than result.append(element)?

Srdjan Grubor Srdjan Grubor 2961 silver badge5 bronze badges · Answer 3 · 2021-09-24 03:14:46Z

The first thing to know about Python is that most basic operations on lists have helper built-ins. Your code could use set() as the holding structure which will ensure uniqueness for you without any work.

With that out of the way, I think your assignment is more about showing your work so let's take that view on it:

Your run variable is not needed. You can use break keyword to get out of the while loop.
You should try to do the early escape first in conditions to avoid nesting. In your case, you can say "if input is 0, break" then everything below assumes that value is not 0 so you don't need an else.
Double loop is overkill. if <numer> not in <array>: then add the number to array. It's slow but easy. You can do better with a set or a dictionary but that's excessive work for what you are doing.
If you're working in Python avoid index-based iteration. You can/should use iterables: for value in entered_data:.
Finally, avoid short names on variables. They create unmaintainable code and keep mental overhead high.

Stack Exchange Network

Remove repeated numbers from a list

3 Answers 3

Add some unit tests

Full modified code

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Remove repeated numbers from a list

3 Answers 3

Add some unit tests

Full modified code

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions