Remove duplicate words from a sentence

Question 1

In this challenge, you will remove duplicate words from each sentence.

Examples

Hello Hello, World!
Hello, World!
Code Code! Golf Code
Code! Golf Code
Hello hello World
Hello World
Programming Golf Programming!
Programming Golf!

Specification

Input will be a string of ASCII characters.
A sentence is defined as anything until the end of the string, a linefeed (\n), or punctuation (.!?).
A word is defined as a sequence of A-Za-z.
Words are case insensitive (Hello == heLlO).

Only the first occurrence of a word in a sentence is kept.
If a word is removed, the spaces before the removed word should be removed. (e.g. A A B -> A B).

As always standard loopholes are disallowed.

This is code-golf so shortest code in bytes wins!

Question 2

a b a. goes to what?

Question 3

@ThomasKwa a b. because the ` a` is removed.

Question 4

For a__b_b_a, do you get a_b_a (first b removed) or a__b_a (second b removed)?

Question 5

@CamilStaps you would get a__b__ because the repeated b is removed and the repeated a is removed

Question 6

@BradGilbertb2gills All ASCII characters are allowed in the input. Only letters are considered words though

Question 7

JavaScript (ES6), 98

Note while I found it myself, it's annoyingly similar to @Neil's, just with the additional logic to split the whole input string in sentences.

s=>s.replace(/[^\n.!?]+/g,s=>s.replace(/ *([a-z]+)/ig,(r,w)=>k[w=w.toUpperCase()]?'':k[w]=r,k=[]))

Test

f=s=>s.replace(/[^\n.!?]+/g,s=>s.replace(/ *([a-z]+)/ig,(r,w)=>k[w=w.toUpperCase()]?'':k[w]=r,k=[]))
console.log=x=>O.textContent+=x+'\n'
;[['Hello Hello, World!','Hello, World!']
,['Code Code! Golf Code','Code! Golf Code']
,['Hello hello World','Hello World']
,['Programming Golf Programming!','Programming Golf!']]
.forEach(t=>{
 var i=t[0],k=t[1],r=f(i)
 console.log((r==k?'OK ':'KO ')+i+' -> '+r)
})

<pre id=O></pre>

Question 8

Retina, (削除) 66 (削除ここまで) 46 bytes

Byte count assumes ISO 8859-1 encoding.

i`[a-z]+
·0ドル·
i` *(·[a-z]+·)(?<=1円[^.!?¶]+)|·

Try it online!

Explanation

Since only letters should be considered word characters (but regex treats digits and underscores as word characters, too), we need to make our own word boundaries. Since the input is guaranteed to contain only ASCII characters, I'm inserting · (outside of ASCII, but inside ISO 8859-1) around all words and remove them again with the duplicates. That saves 20 bytes over using lookarounds to implement generic word boundaries.

i`[a-z]+
·0ドル·

This matches every word and surrounds it in ·.

i` *(·[a-z]+·)(?<=1円[^.!?¶]+)|·

This is two steps compressed into one. <sp>*(·[a-z]+·)(?<=1円[^.!?¶]+) matches a full word (ensured by including the · in the match), along with any spaces preceding it, provided that (as ensured by the lookbehind) we can find the same word somewhere earlier in the sentence. (The ¶ matches a linefeed.)

The other part is simply the ·, which matches all artificial word boundaries that weren't matched as part of the first half. In either case, the match is simply removed from the string.

Question 9

C, 326 bytes

Who needs regular expressions?

#include <ctype.h>
#define a isalpha
#define c(x)*x&&!strchr(".?!\n",*x)
#define f(x)for(n=e;*x&&!a(*x);++x);
main(p,v,n,e,o,t)char**v,*p,*n,*e,*o,*t;{for(p=v[1];*p;p=e){f(p)for(e=p;c(e);){for(;a(*++e););f(n)if(c(n)){for(o=p,t=n;a(*o)&&(*o-65)%32==(*t-65)%32;o++,t++);if(a(*t))e=n;else memmove(e,t,strlen(t)+1);}}}puts(v[1]);}

Question 10

Vim, 27 bytes

:s/\v\c(<\a+>).{-}\zs\s+1円

Note that the 27 bytes is including a trailing carriage return at the end.

Try it online! Side note: This is a link to a different language I'm writing called "V". V is mostly backwards compatible with vim, so for all intents and purposes, it can count as a vim interpreter. I also added one byte % so that you can verify all test cases at once.

Explanation:

:s/\v "Substitute with the 'Magic flag' on. This magic flag allows us
 "to shorten the regex by removing a lot of \ characters.
 \c(<\a+>) "A case-insensitive word
 .{-} "Any character (non-greedy)
 \zs "Start the selection. This means everything after this atom
 "will be removed
 \s+ "One or more whitespace characters,
 1円 "Followed by the first word

Question 11

Perl 6, 104 bytes

{[~] .split(/<[.!?\n]>+/,:v).map(->$_,$s?{.comb(/.*?<:L>+/).unique(as=>{/<:L>+/;lc $/}).join~($s//'')})} # 104

Usage:

# give it a lexical name
my &code = {...}
say code "Hello Hello, World!
Code Code! Golf Code
Hello hello World
Programming Golf Programming!";

Hello, World!
Code! Golf Code
Hello World
Programming Golf!

Explanation

{
 [~] # join everything that follows:
 .split(/<[.!?\n]>+/,:v) # split on boundaries, keeping them
 .map( # loop over sentence and boundary together
 -> $_, $s? { # boundary is optional (at the end of the string)
 .comb(/.*?<:L>+/) # grab the words along with leading non letters
 .unique( # keep the unique ones by looking at ...
 as => {/<:L>+/;lc $/} # only the word chars in lowercase
 )
 .join # join the sentence parts
 ~ # join that with ...
 ($s//'') # the boundary characters or empty string 
 }
 )
}

Question 12

Perl 5 with -p option, 27 + 1 = 28 bytes

s/\A(\w+)[^.!?]*\K\s+1円//gi

Try it online!

Question 13

Perl 5, 57 bytes

56 bytes code + 1 for -p

s/[^.!?
]+/my%h;$&=~s~\s*([A-z]+)~!$h{lc1ドル}++&&$&~egr/eg

Usage:

perl -pe 's/[^.!?
]+/my%h;$&=~s~\s*([A-z]+)~!$h{lc1ドル}++&&$&~egr/eg' <<< 'Hello Hello, World!
Code Code! Golf Code
Hello hello World
Programming Golf Programming!
'
Hello, World!
Code! Golf Code
Hello World
Programming Golf!

Might need to be +1, currently I'm assuming that there will only be spaces in the input, no tabs.

Question 14

From a comment "All ASCII characters are allowed in the input. Only letters are considered words though" (I'll edit this into the challenge, I think)

Question 15

@MartinBüttner Damn, ok I'll update to use \s instead... Still nowhere near your retina answer though!

Question 16

Oh I see why you asked now. If we need to remove whitespace in front of words, then I need another byte as well. The question specifically says "spaces" though. I've asked for clarification.

Question 17

@MartinBüttner I guess my comment wasn't really clear either! Thanks for your comments though!

Question 18

JavaScript (Node.js), 50 bytes

x=>x.replace(/(?<=\b(\w+)\b[^.!?\n]*) +1円\b/ig,'')

Try it online!

Question 19

Stax, 39 bytes

ô┘n=3⌠↕ê║▌ßa⌡ô1⁄2â╖¬╥▓lå3Öîz╝╥'n┐←ûl₧'○しろまる0T

Run and debug it

I'm not sure if Stax's JS regex has the features to match up with Vim regex, but nested regex works!

Explanation

'!+".+?[\n!?]"{cV^{c:~\{:}m$"\s*"s+{i!*}Rc}Rd}RNd
'!+ append an ! to the string
 ".+?[\n!?]" match anything(non-greedy),
 followed by a sentence terminator
 { }R replace each sentence with:
 c copy it
 V^ /[a-zA-Z]+/
 { }R for each regex word match:
 c copy the word
 :~ toggle case
 \ zip the two
 {:}m embed each pair in square brackets
 $ flatten
 "\s*"s+ prepend \s*
 Example for "wOrd":
 "\s*[wW][Oo][rR][dD]"
 { }R replace each occurrence of word with:
 i! index negated (1 if first)
 * multiply with word
 c copy
 d delete the faulty replaced string
 Nd remove last !

edc65 32.3k3 gold badges37 silver badges90 bronze badges · Answer 1 · 2016-01-25 23:08:14Z

JavaScript (ES6), 98

Note while I found it myself, it's annoyingly similar to @Neil's, just with the additional logic to split the whole input string in sentences.

s=>s.replace(/[^\n.!?]+/g,s=>s.replace(/ *([a-z]+)/ig,(r,w)=>k[w=w.toUpperCase()]?'':k[w]=r,k=[]))

Test

f=s=>s.replace(/[^\n.!?]+/g,s=>s.replace(/ *([a-z]+)/ig,(r,w)=>k[w=w.toUpperCase()]?'':k[w]=r,k=[]))
console.log=x=>O.textContent+=x+'\n'
;[['Hello Hello, World!','Hello, World!']
,['Code Code! Golf Code','Code! Golf Code']
,['Hello hello World','Hello World']
,['Programming Golf Programming!','Programming Golf!']]
.forEach(t=>{
 var i=t[0],k=t[1],r=f(i)
 console.log((r==k?'OK ':'KO ')+i+' -> '+r)
})

<pre id=O></pre>

Martin Ender 198k67 gold badges455 silver badges998 bronze badges · Answer 2 · 2016-01-26 07:52:00Z

Retina, (削除) 66 (削除ここまで) 46 bytes

Byte count assumes ISO 8859-1 encoding.

i`[a-z]+
·0ドル·
i` *(·[a-z]+·)(?<=1円[^.!?¶]+)|·

Try it online!

Explanation

Since only letters should be considered word characters (but regex treats digits and underscores as word characters, too), we need to make our own word boundaries. Since the input is guaranteed to contain only ASCII characters, I'm inserting · (outside of ASCII, but inside ISO 8859-1) around all words and remove them again with the duplicates. That saves 20 bytes over using lookarounds to implement generic word boundaries.

i`[a-z]+
·0ドル·

This matches every word and surrounds it in ·.

i` *(·[a-z]+·)(?<=1円[^.!?¶]+)|·

This is two steps compressed into one. <sp>*(·[a-z]+·)(?<=1円[^.!?¶]+) matches a full word (ensured by including the · in the match), along with any spaces preceding it, provided that (as ensured by the lookbehind) we can find the same word somewhere earlier in the sentence. (The ¶ matches a linefeed.)

The other part is simply the ·, which matches all artificial word boundaries that weren't matched as part of the first half. In either case, the match is simply removed from the string.

Cole Cameron 1,0836 silver badges7 bronze badges · Answer 3 · 2016-01-26 21:56:08Z

C, 326 bytes

Who needs regular expressions?

#include <ctype.h>
#define a isalpha
#define c(x)*x&&!strchr(".?!\n",*x)
#define f(x)for(n=e;*x&&!a(*x);++x);
main(p,v,n,e,o,t)char**v,*p,*n,*e,*o,*t;{for(p=v[1];*p;p=e){f(p)for(e=p;c(e);){for(;a(*++e););f(n)if(c(n)){for(o=p,t=n;a(*o)&&(*o-65)%32==(*t-65)%32;o++,t++);if(a(*t))e=n;else memmove(e,t,strlen(t)+1);}}}puts(v[1]);}

DJMcMayhem 60.1k18 gold badges203 silver badges352 bronze badges · Answer 4 · 2016-06-28 03:38:20Z

Vim, 27 bytes

:s/\v\c(<\a+>).{-}\zs\s+1円

Note that the 27 bytes is including a trailing carriage return at the end.

Try it online! Side note: This is a link to a different language I'm writing called "V". V is mostly backwards compatible with vim, so for all intents and purposes, it can count as a vim interpreter. I also added one byte % so that you can verify all test cases at once.

Explanation:

:s/\v "Substitute with the 'Magic flag' on. This magic flag allows us
 "to shorten the regex by removing a lot of \ characters.
 \c(<\a+>) "A case-insensitive word
 .{-} "Any character (non-greedy)
 \zs "Start the selection. This means everything after this atom
 "will be removed
 \s+ "One or more whitespace characters,
 1円 "Followed by the first word

Brad Gilbert b2gills 13.3k1 gold badge19 silver badges36 bronze badges · Answer 5 · 2016-01-25 23:16:36Z

Perl 6, 104 bytes

{[~] .split(/<[.!?\n]>+/,:v).map(->$_,$s?{.comb(/.*?<:L>+/).unique(as=>{/<:L>+/;lc $/}).join~($s//'')})} # 104

Usage:

# give it a lexical name
my &code = {...}
say code "Hello Hello, World!
Code Code! Golf Code
Hello hello World
Programming Golf Programming!";

Hello, World!
Code! Golf Code
Hello World
Programming Golf!

Explanation

{
 [~] # join everything that follows:
 .split(/<[.!?\n]>+/,:v) # split on boundaries, keeping them
 .map( # loop over sentence and boundary together
 -> $_, $s? { # boundary is optional (at the end of the string)
 .comb(/.*?<:L>+/) # grab the words along with leading non letters
 .unique( # keep the unique ones by looking at ...
 as => {/<:L>+/;lc $/} # only the word chars in lowercase
 )
 .join # join the sentence parts
 ~ # join that with ...
 ($s//'') # the boundary characters or empty string 
 }
 )
}

blhsing 2063 silver badges8 bronze badges · Answer 6 · 2021-05-10 02:04:42Z

2

\$\begingroup\$

Perl 5 with -p option, 27 + 1 = 28 bytes

s/\A(\w+)[^.!?]*\K\s+1円//gi

Try it online!

Share

Improve this answer

edited May 10, 2021 at 2:10

answered May 10, 2021 at 2:04

blhsing's user avatar

blhsing

2063 silver badges8 bronze badges

\$\endgroup\$

Add a comment |

Dom Hastings 24.6k4 gold badges58 silver badges94 bronze badges · Answer 7 · 2016-01-27 13:25:50Z

1

\$\begingroup\$

Perl 5, 57 bytes

56 bytes code + 1 for -p

s/[^.!?
]+/my%h;$&=~s~\s*([A-z]+)~!$h{lc1ドル}++&&$&~egr/eg

Usage:

perl -pe 's/[^.!?
]+/my%h;$&=~s~\s*([A-z]+)~!$h{lc1ドル}++&&$&~egr/eg' <<< 'Hello Hello, World!
Code Code! Golf Code
Hello hello World
Programming Golf Programming!
'
Hello, World!
Code! Golf Code
Hello World
Programming Golf!

Might need to be +1, currently I'm assuming that there will only be spaces in the input, no tabs.

Share

Improve this answer

edited Jan 27, 2016 at 13:55

answered Jan 27, 2016 at 13:25

Dom Hastings's user avatar

Dom Hastings

24.6k4 gold badges58 silver badges94 bronze badges

\$\endgroup\$

4

\$\begingroup\$ From a comment "All ASCII characters are allowed in the input. Only letters are considered words though" (I'll edit this into the challenge, I think) \$\endgroup\$

Martin Ender
– Martin Ender

2016年01月27日 13:47:33 +00:00
Commented Jan 27, 2016 at 13:47
\$\begingroup\$ @MartinBüttner Damn, ok I'll update to use \s instead... Still nowhere near your retina answer though! \$\endgroup\$

Dom Hastings
– Dom Hastings

2016年01月27日 13:54:44 +00:00
Commented Jan 27, 2016 at 13:54
\$\begingroup\$ Oh I see why you asked now. If we need to remove whitespace in front of words, then I need another byte as well. The question specifically says "spaces" though. I've asked for clarification. \$\endgroup\$

Martin Ender
– Martin Ender

2016年01月27日 14:04:17 +00:00
Commented Jan 27, 2016 at 14:04
\$\begingroup\$ @MartinBüttner I guess my comment wasn't really clear either! Thanks for your comments though! \$\endgroup\$

Dom Hastings
– Dom Hastings

2016年01月27日 14:07:40 +00:00
Commented Jan 27, 2016 at 14:07

Add a comment |

l4m2 32.4k2 gold badges26 silver badges115 bronze badges · Answer 8 · 2021-05-07 05:05:50Z

1

\$\begingroup\$

JavaScript (Node.js), 50 bytes

x=>x.replace(/(?<=\b(\w+)\b[^.!?\n]*) +1円\b/ig,'')

Try it online!

Share

Improve this answer

answered May 7, 2021 at 5:05

l4m2's user avatar

l4m2

32.4k2 gold badges26 silver badges115 bronze badges

\$\endgroup\$

Add a comment |

Razetime 27.6k3 gold badges31 silver badges77 bronze badges · Answer 9 · 2021-05-06 10:51:04Z

Stax, 39 bytes

ô┘n=3⌠↕ê║▌ßa⌡ô1⁄2â╖¬╥▓lå3Öîz╝╥'n┐←ûl₧'○しろまる0T

Run and debug it

I'm not sure if Stax's JS regex has the features to match up with Vim regex, but nested regex works!

Explanation

'!+".+?[\n!?]"{cV^{c:~\{:}m$"\s*"s+{i!*}Rc}Rd}RNd
'!+ append an ! to the string
 ".+?[\n!?]" match anything(non-greedy),
 followed by a sentence terminator
 { }R replace each sentence with:
 c copy it
 V^ /[a-zA-Z]+/
 { }R for each regex word match:
 c copy the word
 :~ toggle case
 \ zip the two
 {:}m embed each pair in square brackets
 $ flatten
 "\s*"s+ prepend \s*
 Example for "wOrd":
 "\s*[wW][Oo][rR][dD]"
 { }R replace each occurrence of word with:
 i! index negated (1 if first)
 * multiply with word
 c copy
 d delete the faulty replaced string
 Nd remove last !

Stack Exchange Network

Remove duplicate words from a sentence

Examples

Specification

9 Answers 9

JavaScript (ES6), 98

Retina, (削除) 66 (削除ここまで) 46 bytes

Explanation

C, 326 bytes

Vim, 27 bytes

Perl 6, 104 bytes

Usage:

Explanation

Perl 5 with -p option, 27 + 1 = 28 bytes

Perl 5, 57 bytes

Usage:

JavaScript (Node.js), 50 bytes

Stax, 39 bytes

Explanation

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Remove duplicate words from a sentence

Examples

Specification

9 Answers 9

JavaScript (ES6), 98

Retina, (削除) 66 (削除ここまで) 46 bytes

Explanation

C, 326 bytes

Vim, 27 bytes

Perl 6, 104 bytes

Usage:

Explanation

Perl 5 with -p option, 27 + 1 = 28 bytes

Perl 5, 57 bytes

Usage:

JavaScript (Node.js), 50 bytes

Stax, 39 bytes

Explanation

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions