Following example:
string1 = "calvin klein design dress calvin klein"
How can I remove the second two duplicates "calvin"
and "klein"
?
The result should look like
string2 = "calvin klein design dress"
only the second duplicates should be removed and the sequence of the words should not be changed!
17 Answers 17
string1 = "calvin klein design dress calvin klein"
words = string1.split()
print (" ".join(sorted(set(words), key=words.index)))
This sorts the set of all the (unique) words in your string by the word's index in the original list of words.
def unique_list(l):
ulist = []
[ulist.append(x) for x in l if x not in ulist]
return ulist
a="calvin klein design dress calvin klein"
a=' '.join(unique_list(a.split()))
-
12Unfortunately it's O(N²) – the
in
goes through the wholeulist
each time. Don't use it for long lists.ᅠᅠᅠ– ᅠᅠᅠ2011年10月17日 13:18:53 +00:00Commented Oct 17, 2011 at 13:18 -
Thanks Pablo. I found that list comprehension part about 2 years ago on SO itself. Have been using it ever since.spicavigo– spicavigo2011年10月17日 13:18:58 +00:00Commented Oct 17, 2011 at 13:18
-
@Petr. Thats true. I provided it here under the assumption that the list is not going to be too long.spicavigo– spicavigo2011年10月17日 13:20:10 +00:00Commented Oct 17, 2011 at 13:20
-
13I find your use of append in a list comprehension disturbing.Markus– Markus2011年10月17日 14:17:00 +00:00Commented Oct 17, 2011 at 14:17
-
2A list comprehension is inappropriate and should not be used unless you're using the output. Use a proper
for x in l: if x not in ulist: ulist.append(x)
.Chris Morgan– Chris Morgan2011年10月17日 22:47:09 +00:00Commented Oct 17, 2011 at 22:47
In Python 2.7+, you could use collections.OrderedDict
for this:
from collections import OrderedDict
s = "calvin klein design dress calvin klein"
print ' '.join(OrderedDict((w,w) for w in s.split()).keys())
-
5
' '.join(OrderedDict.fromkeys(s.split()))
.ekhumoro– ekhumoro2017年02月16日 00:44:16 +00:00Commented Feb 16, 2017 at 0:44
Cut and paste from the itertools recipes
from itertools import ifilterfalse
def unique_everseen(iterable, key=None):
"List unique elements, preserving order. Remember all elements ever seen."
# unique_everseen('AAAABBBCCDAABBB') --> A B C D
# unique_everseen('ABBCcAD', str.lower) --> A B C D
seen = set()
seen_add = seen.add
if key is None:
for element in ifilterfalse(seen.__contains__, iterable):
seen_add(element)
yield element
else:
for element in iterable:
k = key(element)
if k not in seen:
seen_add(k)
yield element
I really wish they could go ahead and make a module out of those recipes soon. I'd very much like to be able to do from itertools_recipes import unique_everseen
instead of using cut-and-paste every time I need something.
Use like this:
def unique_words(string, ignore_case=False):
key = None
if ignore_case:
key = str.lower
return " ".join(unique_everseen(string.split(), key=key))
string2 = unique_words(string1)
-
I timed a few of these... this one is very fast, even for long lists.Markus– Markus2011年10月17日 14:41:51 +00:00Commented Oct 17, 2011 at 14:41
-
1@lazyr: As for your wish, it turns out you can do exactly that. Just install the package from PyPI.ᅠᅠᅠ– ᅠᅠᅠ2011年10月17日 22:20:00 +00:00Commented Oct 17, 2011 at 22:20
-
@Petr This news does not suprise me in the slightest. I'd be amazed if there weren't a PyPI package for just that. What I meant was that it should be part of the included batteries in python, since they are used so frequently. I'm rather puzzled as to why they're not.Lauritz V. Thaulow– Lauritz V. Thaulow2011年10月18日 21:38:03 +00:00Commented Oct 18, 2011 at 21:38
string2 = ' '.join(set(string1.split()))
Explanation:
.split()
- it is a method to split string to list (without params it split by spaces)
set()
- it is type of unordered collections that exclude dublicates
'separator'.join(list)
- mean that you want to join list from params to string with 'separator' between elements
-
1While this might answer the authors question, it lacks some explaining words and/or links to documentation. Raw code snippets are not very helpful without some phrases around them. You may also find how to write a good answer very helpful. Please edit your answer.hellow– hellow2018年11月09日 09:46:03 +00:00Commented Nov 9, 2018 at 9:46
-
5This potentially changes the order of the words in the string.parvus– parvus2020年10月08日 05:39:07 +00:00Commented Oct 8, 2020 at 5:39
-
1This will not remove duplicates if you want to split on other element than space. fe:
"cisco, cisco systems, cisco".join(set(a.split()))
will output:'cisco, systems, cisco'
Tomas Pytel– Tomas Pytel2021年10月12日 12:34:10 +00:00Commented Oct 12, 2021 at 12:34
string = 'calvin klein design dress calvin klein'
def uniquify(string):
output = []
seen = set()
for word in string.split():
if word not in seen:
output.append(word)
seen.add(word)
return ' '.join(output)
print uniquify(string)
You can use a set to keep track of already processed words.
words = set()
result = ''
for word in string1.split():
if word not in words:
result = result + word + ' '
words.add(word)
print result
-
2Note that
set
is a built-in type. No need to import it (unless you use an ancient version of Python).ᅠᅠᅠ– ᅠᅠᅠ2011年10月17日 13:15:52 +00:00Commented Oct 17, 2011 at 13:15 -
1You should make
result
a list,append
the words to it, and thenreturn " ".join(result)
in the end. This is much more efficient.Lauritz V. Thaulow– Lauritz V. Thaulow2011年10月17日 14:44:12 +00:00Commented Oct 17, 2011 at 14:44
Several answers are pretty close to this but haven't quite ended up where I did:
def uniques( your_string ):
seen = set()
return ' '.join( seen.add(i) or i for i in your_string.split() if i not in seen )
Of course, if you want it a tiny bit cleaner or faster, we can refactor a bit:
def uniques( your_string ):
words = your_string.split()
seen = set()
seen_add = seen.add
def add(x):
seen_add(x)
return x
return ' '.join( add(i) for i in words if i not in seen )
I think the second version is about as performant as you can get in a small amount of code. (More code could be used to do all the work in a single scan across the input string but for most workloads, this should be sufficient.)
Question: Remove the duplicates in a string
from _collections import OrderedDict
a = "Gina Gini Gini Protijayi"
aa = OrderedDict().fromkeys(a.split())
print(' '.join(aa))
# output => Gina Gini Protijayi
-
Starting from Python 3.7, insertion order is guaranteed in dicts. So no need for OrderedDict.Dimitris Paraschakis– Dimitris Paraschakis2022年11月14日 16:29:20 +00:00Commented Nov 14, 2022 at 16:29
Use numpy function make an import its better to have an alias for the import (as np)
import numpy as np
and then you can bing it like this for removing duplicates from array you can use it this way
no_duplicates_array = np.unique(your_array)
for your case if you want result in string you can use
no_duplicates_string = ' '.join(np.unique(your_string.split()))
To remove duplicate words from sentence and preserve the order of the words you can use dict.fromkeys
method.
string1 = "calvin klein design dress calvin klein"
words = string1.split()
result = " ".join(list(dict.fromkeys(words)))
print(result)
11 and 2 work perfectly:
s="the sky is blue very blue"
s=s.lower()
slist = s.split()
print " ".join(sorted(set(slist), key=slist.index))
and 2
s="the sky is blue very blue"
s=s.lower()
slist = s.split()
print " ".join(sorted(set(slist), key=slist.index))
-
How is this
key
argument work? I couldn't find it in the documentation.xuanyue– xuanyue2016年08月19日 20:49:12 +00:00Commented Aug 19, 2016 at 20:49
You can remove duplicate or repeated words from a text file or string using following codes -
from collections import Counter
for lines in all_words:
line=''.join(lines.lower())
new_data1=' '.join(lemmatize_sentence(line))
new_data2 = word_tokenize(new_data1)
new_data3=nltk.pos_tag(new_data2)
# below code is for removal of repeated words
for i in range(0, len(new_data3)):
new_data3[i] = "".join(new_data3[i])
UniqW = Counter(new_data3)
new_data5 = " ".join(UniqW.keys())
print (new_data5)
new_data.append(new_data5)
print (new_data)
P.S. -Do identations as per required. Hope this helps!!!
Without using the split function (will help in interviews)
def unique_words2(a):
words = []
spaces = ' '
length = len(a)
i = 0
while i < length:
if a[i] not in spaces:
word_start = i
while i < length and a[i] not in spaces:
i += 1
words.append(a[word_start:i])
i += 1
words_stack = []
for val in words: #
if val not in words_stack: # We can replace these three lines with this one -> [words_stack.append(val) for val in words if val not in words_stack]
words_stack.append(val) #
print(' '.join(words_stack)) # or return, your choice
unique_words2('calvin klein design dress calvin klein')
initializing list
listA = [ 'xy-xy', 'pq-qr', 'xp-xp-xp', 'dd-ee']
print("Given list : ",listA)
using set()
and split()
res = [set(sub.split('-')) for sub in listA]
Result
print("List after duplicate removal :", res)
-
Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.2021年10月17日 12:51:32 +00:00Commented Oct 17, 2021 at 12:51
import re
# Calea către fișierul tău
file_path = "g:\Pyton+ChatGPT\dictionar_no_duplicates.txt"
# Citește conținutul fișierului
with open(file_path, "r", encoding="utf-8") as file:
text = file.read()
# Elimină cuvintele duplicate
result = re.sub(r'\b(\w+)\b(?=.*\b1円\b)', '', text)
# Elimină spații suplimentare sau virgule consecutive
result = re.sub(r'\s+', ' ', result).strip().replace(" ,", ",")
# Rescrie fișierul cu conținutul fără duplicate
with open(file_path, "w", encoding="utf-8") as file:
file.write(result)
OR THIS
def remove_duplicates(words):
words_stack = []
for val in words:
if val not in words_stack:
words_stack.append(val)
return words_stack
input_file = r'g:\Pyton+ChatGPT\dictionar.txt'
output_file = r'g:\Pyton+ChatGPT\dictionar_no_duplicates.txt'
with open(input_file, 'r', encoding='utf-8') as f:
words = f.read().splitlines()
unique_words = remove_duplicates(words)
with open(output_file, 'w', encoding='utf-8') as f:
for word in unique_words:
f.write(word + '\n')
print("Duplicate removal completed.")
OR THIS
import re
# Calea către fișierul tău
file_path = "g:\Pyton+ChatGPT\dictionar_no_duplicates.txt"
# Citește conținutul fișierului
with open(file_path, "r", encoding="utf-8") as file:
text = file.read()
# Crează o listă pentru cuvintele eliminate
removed_words = []
# Funcție callback pentru a adăuga cuvintele duplicate în listă
def replace_and_collect(match):
word = match.group(1)
if word not in removed_words:
removed_words.append(word)
return ''
# Elimină cuvintele duplicate și virgula asociată folosind funcția callback
result = re.sub(r'\b(\w+)\b,?(?=.*\b1円\b)', replace_and_collect, text)
# Elimină spații suplimentare sau virgule consecutive
result = re.sub(r'\s+', ' ', result).strip().replace(" ,", ",").strip(", ")
# Rescrie fișierul cu conținutul fără duplicate
with open(file_path, "w", encoding="utf-8") as file:
file.write(result)
# Afișează informații despre cuvintele eliminate
print(f"Numărul de cuvinte duplicate eliminate: {len(removed_words)}")
print(f"Cuvintele eliminate: {', '.join(removed_words)}")
You can do that simply by getting the set associated to the string, which is a mathematical object containing no repeated elements by definition. It suffices to join the words in the set back into a string:
def remove_duplicate_words(string):
x = string.split()
x = sorted(set(x), key = x.index)
return ' '.join(x)
-
While this might answer the authors question, it lacks some explaining words and/or links to documentation. Raw code snippets are not very helpful without some phrases around them. You may also find how to write a good answer very helpful. Please edit your answer.hellow– hellow2018年11月09日 09:46:17 +00:00Commented Nov 9, 2018 at 9:46
-
1This potentially changes the order of the words in the string.parvus– parvus2020年10月08日 05:38:40 +00:00Commented Oct 8, 2020 at 5:38
-
Thanks @parvus I have modified my answerMffd4n1– Mffd4n12020年10月09日 09:37:34 +00:00Commented Oct 9, 2020 at 9:37