Remove repeated words from a string

Question 1

Remove all repeating words from an inputted sentence.

Input will be something like cat dog cat dog bird dog Snake snake Snake and the output should be cat dog bird Snake snake. There will always be a single space separating words.

Output order must be the same as input. (Refer to the example)

You don't need to handle punctuation but capital letter handling is required.

Question 2

I recommend waiting to accept an answer for at least a few days. A shorter solution may still come.

Question 3

I expect similar solutions to uniqchars, except that this doesn't ban built-ins that remove duplicates.

Question 4

Seeing the example, there is not special capital letter handling: Snake and snake are treated simply as different

Question 5

@AlexA.: In fact, there already is one. codegolf.stackexchange.com/questions/62044/…

Question 6

CJam, 7 chars

qS/_&S*

Can probably be much shorter... but whatever I've almost never used CJam. ^.^

q reads input, S/ splits on spaces, _& duplicates and applies a setwise AND (therefore getting rid of duplicates), and S* re-joins on space.

Online interpreter link

Question 7

How can you even get much shorter than 7? lol

Question 8

Some one just did.

Question 9

Haskell, 34 bytes

import Data.List
unwords.nub.words

Usage example: (unwords.nub.words) "cat dog cat dog bird dog Snake snake Snake" -> "cat dog bird Snake snake".

Question 10

APL, (削除) 22 (削除ここまで) 20 bytes

{1↓∊∪(∊∘' '⊂⊢)' ',⍵}

This creates an unnamed monadic function that accepts a string on the right and returns a string.

Explanation:

 ' ',⍵} ⍝ Prepend a space to the input string
 (∊∘' '⊂⊢) ⍝ Split the string on spaces using a fork
 ∪ ⍝ Select the unique elements
{1↓∊ ⍝ Join into a string and drop the leading space

Try it online

Saved 2 bytes thanks to Dennis!

Question 11

I love any answer that uses a non-esoteric, non-golf language.

Question 12

Ruby, 21 chars

->s{s.split.uniq*' '}

Question 13

TeaScript, 12 bytes

TeaScript is JavaScript for golfing.

xs` `u()j` `

This is pretty short. It splits on each space, filters out duplicates, then rejoins.

Try it online

Question 14

Is it tee-a script or tee script?

Question 15

@MathiasFoster it would be "tee-script"

Question 16

Does TeaScript have letters reserved for variable names? Most of them appear to be shorthands for built-in properties.

Question 17

@intrepidcoder yes all of these: cdfghijklmnopstuvw are reserved for variables, they are all pre-initialized to 0. b is also reserved for a variable name, it is pre-initialized to an empty string

Question 18

JavaScript (ES6) 33

(see this answer)

Test running the snippet below in an EcmaScript 6 compliant browser (implementing Set, spread operator, template strings and arrow functions - I use Firefox).

Note: the conversion to Set drop all the duplicates and Set mantains the original ordering.

f=s=>[...Set(s.split` `)].join` `
function test() { O.innerHTML=f(I.value) }
test()

#I { width: 70% }

<input id=I value="cat dog cat dog bird dog Snake snake Snake"/><button onclick="test()">-></button>
<pre id=O></pre>

Question 19

Wow wow wow... I am continually amazed by your ability to cut any solution I think up by 25% or more. +1

Question 20

Looked at the problem and immediately thought of Sets... only to realize that you'd already done it =P very nice!

Question 21

how can set maintain the original ordering?

Question 22

@njzk2 ask the developers of the language. It could be: a set is internally an Array, and at each insertion there is a check to reject duplicates. It's an implementation detail anyway

Question 23

@njzk2 while I don't know how, I know that this fact is specified by the language: Set objects are collections of values, you can iterate its elements in insertion order. A value in the Set may only occur once; it is unique in the Set's collection. (developer.mozilla.org/it/docs/Web/JavaScript/Reference/…)

Question 24

R, 22 bytes

cat(unique(scan(,"")))

This reads a string from STDIN and splits it into a vector on spaces using scan(,""), selects only unique elements, then concatenates them into a string and prints it to STDOUT using cat.

Question 25

PowerShell, 15 Bytes

$args|select -u

Whoa, an actual entry where PowerShell is somewhat competitive? That's unpossible!

Takes the string as input arguments, pipes to Select-Object with the -Unique flag. Spits out an array of strings, preserving order and capitalization as requested.

Usage:

PS C:\Tools\Scripts\golfing> .\remove-repeated-words-from-string.ps1 cat dog cat dog bird dog Snake snake Snake
cat
dog
bird
Snake
snake

If this is too "cheaty" in assuming the input can be as command-line arguments, then go for the following, at ~~(削除) 24 (削除ここまで)~~ 21 Bytes _{(saved some bytes thanks to blabb)}. Interestingly, using the unary operator in this direction happens to also work if the input string is demarcated with quotes or as individual arguments, since the default -split is by spaces. Bonus.

-split$args|select -u

Question 26

Relying on the environment's behavior of spoon-feeding the code with readily split up input...?

Question 27

@manatwork I've added a clarification if the first usage is considered too "cheaty" -- since it's not clear exactly how the input is specified, we'll leave it up to the OP.

Question 28

And now is clear how efficients are PowerShell's own features. That 24 really deserves an upvote.

Question 29

@timmyD you can chop off 3 bytes to the uncheaty ?? version by using the unary split and no need for "" '' in the commandline args too :\>ls -l split.ps1 & type split.ps1 & echo.&powershell -nologo -f split.ps1 cat dog cat dog bird dog Snake snake Snake -rw-rw-rw- 1 Admin 0 21 2015年11月02日 19:06 split.ps1 -split$args|select -u cat dog bird Snake snake

Question 30

Julia, 29 bytes

s->join(unique(split(s))," ")

This creates an unnamed function that splits the string into a vector on spaces, keeps only the unique elements (preserving order), and joins the array back into a string with spaces.

Question 31

Retina, 22 bytes

 (\w+)\b(?<=\b1円\b.+)

Save the file with a trailing linefeed and run it with the -s flag.

This is fairly straight forward in that it matches a single word, and the lookbehind checks whether that same word has appeared in the string before. The trailing linefeed causes Retina to work in Replace mode with an empty replacement string, removing all matches.

Question 32

Mathematica, (削除) 43 (削除ここまで) 39 bytes

StringRiffle@*Keys@*Counts@*StringSplit

Question 33

Kudos for using StringRiffle[].

Question 34

could use Keys@Counts instead of DeleteDuplicates

Question 35

@branislav Does Keys@Counts preserve order?

Question 36

@LegionMammal978 Counts[list] gives an association whose keys are in the same order as they first occur as elements of list.

Question 37

Pyth - 9 bytes

Well this is why we're all waiting for Pyth5, could have been 5 bytes.

jdoxzN{cz

Try it online here.

Question 38

Why isn't Pyth5 valid? It appears to be implemented.

Question 39

@ThomasKwa I don't think it's finished. There hasn't been a versioned release yet.

Question 40

C++11, 291 bytes

#include<iostream>
#include<string>
#include<list>
#include<sstream>
#include<algorithm>
using namespace std;main(){string s;getline(cin,s);list<string>m;stringstream b(s);while(getline(b,s,' '))if(find(m.begin(),m.end(),s)==m.end())m.push_back(s);for(auto a:m)cout<<a<<' ';cout<<endl;}

I don't see a whole lot of C++ answers compared to golfing languages, so why not. Note that this uses C++11 features, and so if your compiler is ~~(削除) stuck in the dark ages (削除ここまで)~~ sufficiently old enough, you may need to pass a special compilation switch to make it use the C++11 standard. For g++, it's -std=c++11 (only needed for versions < 5.2). Try it online

Question 41

If you compare the number of bytes with other languages, you will see why no one is using C++.

Question 42

@CroCo If you realize the point of this site is to find the shortest solution in each language, you will see why I posted this answer.

Question 43

sorry I'm not aware of it.

Question 44

Why not use a set? It allows no duplicates by design. Just push into it.

Question 45

@black A set is not guaranteed to have the items in the same order they were added.

Question 46

K5, 9 bytes

" "/?" "\

FYI, this is a function.

Explanation

 " "\ Split the input on spaces
 ? Find all the unique elements
" "/ Join them back together

Question 47

Matlab: 18 Bytes

unique(d,'stable')

where d is d = {'cat','dog','cat','dog','bird','dog','Snake','snake','Snake'}.

The result is 'cat' 'dog' 'bird' 'Snake' 'snake'

Question 48

Welcome to Programming Puzzles and Code Golf! Submissions here need to either be full programs that read from STDIN and write to STDOUT, or functions which accept input and return output. As it stands, this is merely a snippet; it assumes the variable d is already assigned. You can rectify this by using a function handle: @(d)unique(d,'stable'), at the cost of 4 bytes.

Question 49

Python 3, 55

l=[]
for x in input().split():l+=[x][x in l:]
print(*l)

Yeesh, this is long. Unfortunately, Python's set doesn't keep the order of the elements, so we have to do the work ourselves. We iterate through the input words, keeping a list l of elements that aren't yet in l. Then, we print the contents of l space-separated.

A string version of l would not work if some words are substrings of other words.

Question 50

C#, 38 bytes

String.Join(" ",s.Split().Distinct());

Question 51

I'm not sure you can assume input is already populated in s, I think you should get it as an argument.

Question 52

Welcome to PPCG! Please have a look at our default answer formats. Answers should either be full programs or functions. Unnamed functions (like lambda literals) are fine, but snippets which expect the code to already exist in some variable/on the stack etc. or require a REPL environment are generally disallowed unless the OP explicitly permits them.

Question 53

Perl 6, 14 bytes

As a whole program the only way you would write it is 21 bytes long

say $*IN.words.unique # 21 bytes

As a lambda expression the shortest is 14 bytes

*.words.unique # 14 bytes

say ( *.words.unique ).('cat dog cat dog bird dog Snake snake Snake')
my &foo = *.words.unique;
say foo $*IN;

While the output is a List, if you put it in a stringifying context it will put a space between the elements. If it was a requirement to return a string you could just add a ~ to the front ~*.words.unique.

If snippets were allowed, you could shorten it to 13 bytes by removing the *.

$_ = 'cat dog cat dog bird dog Snake snake Snake';
say .words.unique

Question 54

gs2, 3 bytes

,É-

Encoded in CP437.

STDIN is pushed at the start of the program. , splits it over spaces. É is uniq, which filters duplicates. - joins by spaces.

Question 55

Python 3, (削除) 87 (削除ここまで) 80 bytes

turns out the full program version is shorter

s=input().split(' ')
print(' '.join(e for i,e in enumerate(s)if e not in s[:i]))

Did it without regex, I am happy

Try it online

Question 56

Lua, 94 bytes

function c(a)l={}return a:gsub("%S+",function(b)if l[b]then return""else l[b]=true end end)end

Question 57

An anonymous user suggested to replace ... return""else l[b]=true end end... with ...return""end l[b]=""end....

Question 58

awk, 25

BEGIN{RS=ORS=" "}!c[0ドル]++

Output:

$ printf "cat dog cat dog bird dog Snake snake Snake" | awk 'BEGIN{RS=ORS=" "}!c[0ドル]++'
cat dog bird Snake snake $ 
$

Question 59

JavaScript, (削除) 106 (削除ここまで) (削除) 102 (削除ここまで) 100 bytes

function(s){o={};s.split(' ').map(function(w){o[w]=1});a=[];for(w in o)a.push(w);return a.join(' ')}

// way too long for JS :(

Question 60

Try using JS (aka ECMAScript) 6 arrow functions, which should save 6 bytes. Also, I can already see porting this to CoffeeScript will save at least 30 bytes.

Question 61

This answer is in native JavaScript (ECMA5), there is edc65's one for es6.

Question 62

Hassium, 91 bytes

func main(){d=[]foreach(w in input().split(' '))if(!(d.contains(w))){d.add(w)print(w+" ")}}

Run online and see expanded here

Question 63

PHP (削除) 64 (削除ここまで) 59 bytes

function r($i){echo join(" ",array_unique(split(" ",$i)));}

Question 64

explode() → split(), implode() → join()?

Doorknob 72.1k20 gold badges146 silver badges393 bronze badges · Accepted Answer · 2015-10-29 02:34:44Z

9

\$\begingroup\$

CJam, 7 chars

qS/_&S*

Can probably be much shorter... but whatever I've almost never used CJam. ^.^

q reads input, S/ splits on spaces, _& duplicates and applies a setwise AND (therefore getting rid of duplicates), and S* re-joins on space.

Online interpreter link

Share

Improve this answer

edited Oct 29, 2015 at 2:41

answered Oct 29, 2015 at 2:34

Doorknob's user avatar

Doorknob

72.1k20 gold badges146 silver badges393 bronze badges

\$\endgroup\$

2

1

\$\begingroup\$ How can you even get much shorter than 7? lol \$\endgroup\$

Cruncher
– Cruncher

2015年10月29日 14:50:12 +00:00
Commented Oct 29, 2015 at 14:50
\$\begingroup\$ Some one just did. \$\endgroup\$

Alien G
– Alien G

2015年10月31日 02:07:08 +00:00
Commented Oct 31, 2015 at 2:07

Add a comment |

Remove repeated words from a string

32 Answers 32

CJam, 7 chars

Haskell, 34 bytes

APL, (削除) 22 (削除ここまで) 20 bytes

Ruby, 21 chars

TeaScript, 12 bytes

JavaScript (ES6) 33

R, 22 bytes

PowerShell, 15 Bytes

Usage:

Julia, 29 bytes

Retina, 22 bytes

Mathematica, (削除) 43 (削除ここまで) 39 bytes

Pyth - 9 bytes

C++11, 291 bytes

K5, 9 bytes

Explanation

Matlab: 18 Bytes

Python 3, 55

C#, 38 bytes

Perl 6, 14 bytes

gs2, 3 bytes

Python 3, (削除) 87 (削除ここまで) 80 bytes

Lua, 94 bytes

awk, 25

Output:

JavaScript, (削除) 106 (削除ここまで) (削除) 102 (削除ここまで) 100 bytes

Hassium, 91 bytes

PHP (削除) 64 (削除ここまで) 59 bytes

AppleScript, 162 bytes

Burlesque, 6 bytes

Gema, 21 characters

Scala, (削除) 44 (削除ここまで) 47 bytes

PHP, 37 Bytes

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions