In this challenge, you will remove duplicate words from each sentence.
Examples
Hello Hello, World!
Hello, World!
Code Code! Golf Code
Code! Golf Code
Hello hello World
Hello World
Programming Golf Programming!
Programming Golf!
Specification
- Input will be a string of ASCII characters.
- A sentence is defined as anything until the end of the string, a linefeed (
\n), or punctuation (.!?). - A word is defined as a sequence of
A-Za-z. - Words are case insensitive (
Hello==heLlO).
- Only the first occurrence of a word in a sentence is kept.
- If a word is removed, the spaces before the removed word should be removed. (e.g.
A A B->A B).
- As always standard loopholes are disallowed.
This is code-golf so shortest code in bytes wins!
9 Answers 9
JavaScript (ES6), 98
Note while I found it myself, it's annoyingly similar to @Neil's, just with the additional logic to split the whole input string in sentences.
s=>s.replace(/[^\n.!?]+/g,s=>s.replace(/ *([a-z]+)/ig,(r,w)=>k[w=w.toUpperCase()]?'':k[w]=r,k=[]))
Test
f=s=>s.replace(/[^\n.!?]+/g,s=>s.replace(/ *([a-z]+)/ig,(r,w)=>k[w=w.toUpperCase()]?'':k[w]=r,k=[]))
console.log=x=>O.textContent+=x+'\n'
;[['Hello Hello, World!','Hello, World!']
,['Code Code! Golf Code','Code! Golf Code']
,['Hello hello World','Hello World']
,['Programming Golf Programming!','Programming Golf!']]
.forEach(t=>{
var i=t[0],k=t[1],r=f(i)
console.log((r==k?'OK ':'KO ')+i+' -> '+r)
})
<pre id=O></pre>
Retina, (削除) 66 (削除ここまで) 46 bytes
Byte count assumes ISO 8859-1 encoding.
i`[a-z]+
·0ドル·
i` *(·[a-z]+·)(?<=1円[^.!?¶]+)|·
Explanation
Since only letters should be considered word characters (but regex treats digits and underscores as word characters, too), we need to make our own word boundaries. Since the input is guaranteed to contain only ASCII characters, I'm inserting · (outside of ASCII, but inside ISO 8859-1) around all words and remove them again with the duplicates. That saves 20 bytes over using lookarounds to implement generic word boundaries.
i`[a-z]+
·0ドル·
This matches every word and surrounds it in ·.
i` *(·[a-z]+·)(?<=1円[^.!?¶]+)|·
This is two steps compressed into one. <sp>*(·[a-z]+·)(?<=1円[^.!?¶]+) matches a full word (ensured by including the · in the match), along with any spaces preceding it, provided that (as ensured by the lookbehind) we can find the same word somewhere earlier in the sentence. (The ¶ matches a linefeed.)
The other part is simply the ·, which matches all artificial word boundaries that weren't matched as part of the first half. In either case, the match is simply removed from the string.
C, 326 bytes
Who needs regular expressions?
#include <ctype.h>
#define a isalpha
#define c(x)*x&&!strchr(".?!\n",*x)
#define f(x)for(n=e;*x&&!a(*x);++x);
main(p,v,n,e,o,t)char**v,*p,*n,*e,*o,*t;{for(p=v[1];*p;p=e){f(p)for(e=p;c(e);){for(;a(*++e););f(n)if(c(n)){for(o=p,t=n;a(*o)&&(*o-65)%32==(*t-65)%32;o++,t++);if(a(*t))e=n;else memmove(e,t,strlen(t)+1);}}}puts(v[1]);}
Vim, 27 bytes
:s/\v\c(<\a+>).{-}\zs\s+1円
Note that the 27 bytes is including a trailing carriage return at the end.
Try it online! Side note: This is a link to a different language I'm writing called "V". V is mostly backwards compatible with vim, so for all intents and purposes, it can count as a vim interpreter. I also added one byte % so that you can verify all test cases at once.
Explanation:
:s/\v "Substitute with the 'Magic flag' on. This magic flag allows us
"to shorten the regex by removing a lot of \ characters.
\c(<\a+>) "A case-insensitive word
.{-} "Any character (non-greedy)
\zs "Start the selection. This means everything after this atom
"will be removed
\s+ "One or more whitespace characters,
1円 "Followed by the first word
Perl 6, 104 bytes
{[~] .split(/<[.!?\n]>+/,:v).map(->$_,$s?{.comb(/.*?<:L>+/).unique(as=>{/<:L>+/;lc $/}).join~($s//'')})} # 104
Usage:
# give it a lexical name
my &code = {...}
say code "Hello Hello, World!
Code Code! Golf Code
Hello hello World
Programming Golf Programming!";
Hello, World!
Code! Golf Code
Hello World
Programming Golf!
Explanation
{
[~] # join everything that follows:
.split(/<[.!?\n]>+/,:v) # split on boundaries, keeping them
.map( # loop over sentence and boundary together
-> $_, $s? { # boundary is optional (at the end of the string)
.comb(/.*?<:L>+/) # grab the words along with leading non letters
.unique( # keep the unique ones by looking at ...
as => {/<:L>+/;lc $/} # only the word chars in lowercase
)
.join # join the sentence parts
~ # join that with ...
($s//'') # the boundary characters or empty string
}
)
}
Perl 5, 57 bytes
56 bytes code + 1 for -p
s/[^.!?
]+/my%h;$&=~s~\s*([A-z]+)~!$h{lc1ドル}++&&$&~egr/eg
Usage:
perl -pe 's/[^.!?
]+/my%h;$&=~s~\s*([A-z]+)~!$h{lc1ドル}++&&$&~egr/eg' <<< 'Hello Hello, World!
Code Code! Golf Code
Hello hello World
Programming Golf Programming!
'
Hello, World!
Code! Golf Code
Hello World
Programming Golf!
Might need to be +1, currently I'm assuming that there will only be spaces in the input, no tabs.
-
\$\begingroup\$ From a comment "All ASCII characters are allowed in the input. Only letters are considered words though" (I'll edit this into the challenge, I think) \$\endgroup\$Martin Ender– Martin Ender2016年01月27日 13:47:33 +00:00Commented Jan 27, 2016 at 13:47
-
\$\begingroup\$ @MartinBüttner Damn, ok I'll update to use
\sinstead... Still nowhere near your retina answer though! \$\endgroup\$Dom Hastings– Dom Hastings2016年01月27日 13:54:44 +00:00Commented Jan 27, 2016 at 13:54 -
\$\begingroup\$ Oh I see why you asked now. If we need to remove whitespace in front of words, then I need another byte as well. The question specifically says "spaces" though. I've asked for clarification. \$\endgroup\$Martin Ender– Martin Ender2016年01月27日 14:04:17 +00:00Commented Jan 27, 2016 at 14:04
-
\$\begingroup\$ @MartinBüttner I guess my comment wasn't really clear either! Thanks for your comments though! \$\endgroup\$Dom Hastings– Dom Hastings2016年01月27日 14:07:40 +00:00Commented Jan 27, 2016 at 14:07
Stax, 39 bytes
ô┘n=3⌠↕ê║▌ßa⌡ô1⁄2â╖¬╥▓lå3Öîz╝╥'n┐←ûl₧'○しろまる0T
I'm not sure if Stax's JS regex has the features to match up with Vim regex, but nested regex works!
Explanation
'!+".+?[\n!?]"{cV^{c:~\{:}m$"\s*"s+{i!*}Rc}Rd}RNd
'!+ append an ! to the string
".+?[\n!?]" match anything(non-greedy),
followed by a sentence terminator
{ }R replace each sentence with:
c copy it
V^ /[a-zA-Z]+/
{ }R for each regex word match:
c copy the word
:~ toggle case
\ zip the two
{:}m embed each pair in square brackets
$ flatten
"\s*"s+ prepend \s*
Example for "wOrd":
"\s*[wW][Oo][rR][dD]"
{ }R replace each occurrence of word with:
i! index negated (1 if first)
* multiply with word
c copy
d delete the faulty replaced string
Nd remove last !
a b a.goes to what? \$\endgroup\$a b.because the ` a` is removed. \$\endgroup\$a__b_b_a, do you geta_b_a(firstbremoved) ora__b_a(secondbremoved)? \$\endgroup\$a__b__because the repeatedbis removed and the repeatedais removed \$\endgroup\$