I am practicing a problem in "Cracking the Coding Interview". The problem is to remove duplicates in a linked list without the use of a buffer. I interpreted this as not using a list or any large data structure such as a hash to store unique nodes.
My algorithm is inefficient I think. It iterates an anchor across the linked list. From that anchor, a second sub-iteration occurs which removes any nodes that is the same as the anchor. Thus, there are a total of two loops. I think the time complexity is either O(n^2) or O(nlogn)
Any better algorithms are welcome. Since I am a novice in Python, please give me suggestions on my coding style.
class Node(object):
def __init__(self, data):
self.data = data
self.next = None
def getData(self):
return self.data
def setNext(self, node):
self.next = node
def getNext(self):
return self.next
class LinkedList(object):
def __init__(self, dataList):
assert len(dataList) > 0
self.head = Node(dataList[0])
iterNode = self.head
for i in range(1, len(dataList)):
iterNode.setNext(Node(dataList[i]))
iterNode = iterNode.getNext()
iterNode.setNext(None)
def printList(self):
iterNode = self.head
while iterNode is not None:
print(iterNode.getData())
iterNode = iterNode.getNext()
def removeDuplicates(self):
assert self.head is not None
anchor = self.head
while anchor is not None:
iterator = anchor
while iterator is not None:
prev = iterator
iterator = iterator.getNext()
if iterator is None:
break
if iterator.getData() == anchor.getData():
next = iterator.getNext()
prev.setNext(next)
anchor = anchor.getNext()
dataList = ["hello", "world", "people", "hello", "hi", "hi"]
linkedList = LinkedList(dataList)
linkedList.printList()
linkedList.removeDuplicates()
print("\n")
linkedList.printList()
Output:
hello
world
people
hello
hi
hi
hello
world
people
hi
2 Answers 2
Sensible overloading
String representation
Think about a native list like [1, 2, 3]
would think it would be user-friendly if the repl behaved something like this?
>>> [1, 2, 3]
<__main__.list object at 0x7f9b399475c0>
But your class behaves exactly like that:
>>> LinkedList([1,2,3])
<__main__.LinkedList object at 0x7f9b3a69e908>
So instead of that weird printList(self)
implement a __repr__
method for usability sake.
Iteration
A list that you cannot iterate over makes no sense.
In Python each container must expose a way to be iterated over with a for loop, like:
>>> for i in LinkedList([1,2,3]):
print(i)
That does not work for your implementation.
Manual looping like you do in printList
is out of the question in Python. Such low level details should be written once in an __iter__
method and then forgotten.
Problem statement
The problem is underdefined. In a real-life interview I'd ask immediately, do we care about the stability? If yes, your solution is good. If no, sort the list first.
Complexity
\$O(n^2)\$ and \$O(n\log n)\$ are quite different beasts. This algorithm is \$O(n^2)\$ obviously. If sorting is allowed, there is a \$O(n \log n)\$ solution.
getters and setters
Python doesn't have a concept of public and private. At the end of the day everything is public.
getData
,getNext
andsetNext
do not serve any purpose but generating entropy.Even in the language which encourages such approach it only makes sense when there is an invariant to maintain. Unrestricted getter and setter are as good as a public member.